Introduction
Deep learning has driven breakthroughs in AI research and applications, transforming many aspects of modern life. TensorFlow, Google's open-source deep learning framework, is one of the most popular tools for this field. It is widely used for image classification, audio processing, recommendation systems, and natural language processing. While powerful, TensorFlow has a relatively gentle learning curve if you know Python and have a basic understanding of machine learning and neural networks. This guide provides a foundational introduction to TensorFlow.
Getting to Know TensorFlow
Installation
TensorFlow runs on Python 2.7 or 3.x, and supports 64-bit Linux, macOS, and Windows. There are two main package variants: tensorflow (CPU-only) and tensorflow-gpu (GPU-accelerated). For production, the GPU version is recommended for its computational power, but it requires installing CUDA Toolkit and cuDNN. The CPU version is simpler to install, typically via pip install tensorflow. If you encounter issues, search online for solutions based on the error messages.
After installation, verify it by running Python and importing TensorFlow:
>>> import tensorflow as tf
This import statement is a standard convention used throughout this guide.
TensorFlow's Computational Model
In TensorFlow, computation is expressed as a graph of operations. Let's compute c = a + b where a = 3 and b = 2.
>>> a = tf.constant(3)
>>> b = tf.constant(2)
>>> c = a + b
>>> sess = tf.Session()
>>> print(sess.run(c))
5
Unlike Python's immediate print(3+2), TensorFlow requires defining operations and then executing them via a Session. The objects a, b, and c are Tensors—multidimensional arrays that are the core data structure. A scalar is a 0-D tensor, a vector is a 1-D tensor, and a matrix is a 2-D tensor. In deep learning, data like weights, biases, and images are represented as tensors.
The name "TensorFlow" reflects tensors flowing through a computational graph. Each node in the graph is an operation (like addition), and tensors are the inputs and outputs. Execution happens in two phases: building the graph and then running it via a session. This design allows efficient execution, especially on GPUs, by minimizing data transfer overhead.
Instead of sess.run(c), you can use c.eval(session=sess). For convenience, you can create an interactive session:
>>> sess = tf.InteractiveSession()
>>> print(c.eval())
5
To make computations parameterizable, use tf.placeholder:
>>> a = tf.placeholder(tf.int32)
>>> b = tf.placeholder(tf.int32)
>>> c = a + b
>>> print(c.eval({a:3, b:2}))
5
For updatable parameters, use tf.Variable. Variables must be initialized within a session:
>>> a = tf.Variable(3)
>>> b = tf.Variable(2)
>>> c = a + b
>>> init = tf.global_variables_initializer()
>>> sess.run(init)
>>> print(c.eval())
5
Variables are typically used for model parameters like weights and biases that are optimized during training.
Machine Learning with TensorFlow: MNIST Example
Loading Data
The MNIST dataset contains 70,000 labeled images of handwritten digits (0–9). TensorFlow provides a helper function to download and load it:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)
The data is split into 55,000 training images, 5,000 validation images, and 10,000 test images. Each image is 28x28 pixels, flattened into a 784-element vector. Pixel values are normalized between 0 and 1. Labels are one-hot vectors of length 10 (e.g., digit 3 is [0,0,0,1,0,0,0,0,0,0]).
Building a Softmax Regression Model
We'll use a simple Softmax regression model for classification. The model computes:
y = softmax(W * x + b)
Where x is the input vector (784 pixels), W is a weight matrix (784x10), b is a bias vector (10), and softmax converts outputs to probabilities.
x = tf.placeholder(tf.float32, [None, 784])
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b)
We define a loss function to minimize. For classification, cross-entropy is common:
y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = -tf.reduce_sum(y_ * tf.log(y))
TensorFlow provides a combined function for numerical stability:
y = tf.matmul(x, W) + b
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y))
Training the Model
We use gradient descent to minimize the loss:
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
Training involves running the optimizer in a session:
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
We use mini-batch stochastic gradient descent for efficiency.
Evaluating the Model
Accuracy is computed by comparing predictions to true labels:
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print(sess.run(accuracy, feed_dict={x: mnist.test.images, y_: mnist.test.labels}))
This simple model achieves about 91% accuracy.
Deep Learning with TensorFlow: Convolutional Neural Network
CNN Basics
Convolutional Neural Networks (CNNs) are more effective for image tasks. They use convolutional layers to extract local features and pooling layers to reduce dimensionality. ReLU is the preferred activation function for CNNs.
Building a LeNet-5 Style CNN
We'll construct a CNN with two convolutional-pooling layers, followed by fully connected layers.
First Convolutional Layer:
x_image = tf.reshape(x, [-1, 28, 28, 1])
W_conv1 = tf.Variable(tf.truncated_normal([5,5,1,32], stddev=0.1))
b_conv1 = tf.Variable(tf.constant(0.1, shape=[32]))
h_conv1 = tf.nn.relu(tf.nn.conv2d(x_image, W_conv1, strides=[1,1,1,1], padding='SAME') + b_conv1)
h_pool1 = tf.nn.max_pool(h_conv1, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')
Second Convolutional Layer:
W_conv2 = tf.Variable(tf.truncated_normal([5,5,32,64], stddev=0.1))
b_conv2 = tf.Variable(tf.constant(0.1, shape=[64]))
h_conv2 = tf.nn.relu(tf.nn.conv2d(h_pool1, W_conv2, strides=[1,1,1,1], padding='SAME') + b_conv2)
h_pool2 = tf.nn.max_pool(h_conv2, ksize=[1,2,2,1], strides=[1,2,2,1], padding='SAME')
Fully Connected Layer with Dropout:
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
W_fc1 = tf.Variable(tf.truncated_normal([7*7*64, 1024], stddev=0.1))
b_fc1 = tf.Variable(tf.constant(0.1, shape=[1024]))
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
Output Layer:
W_fc2 = tf.Variable(tf.truncated_normal([1024, 10], stddev=0.1))
b_fc2 = tf.Variable(tf.constant(0.1, shape=[10]))
y_conv = tf.matmul(h_fc1_drop, W_fc2) + b_fc2
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))
Training and Evaluation
We use the Adam optimizer and include dropout during training:
train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
sess.run(tf.global_variables_initializer())
for i in range(20000):
batch_xs, batch_ys = mnist.train.next_batch(50)
if i % 100 == 0:
train_accuracy = accuracy.eval(feed_dict={x:batch_xs, y_:batch_ys, keep_prob:1.0})
print('step %d, training accuracy %g' % (i, train_accuracy))
train_step.run(feed_dict={x:batch_xs, y_:batch_ys, keep_prob:0.5})
print('test accuracy %g' % accuracy.eval(feed_dict={
x: mnist.test.images, y_: mnist.test.labels, keep_prob:1.0}))
This CNN achieves about 99.2% accuracy on the test set.
Conclusion
This guide introduced TensorFlow's core concepts and demonstrated basic machine learning and deep learning workflows using the MNIST dataset. TensorFlow offers extensive tools for building, training, and deploying models. To learn more, explore the official documentation and experiment with your own projects.