Loading documents preview...
ECS289: Scalable Machine Learning Big-O Notations Cho-Jui Hsieh UC Davis
Oct 20, 2015
Outline Time complexity and Big-O notations Time complexity for basic linear algebra operators
Time Complexity From Wikipedia: The time complexity of an algorithm quantifies the amount of time taken by an algorithm to run as a function of the length of the input The time complexity of an algorithm is commonly expressed using big O notation, which excludes coefficients and lower order terms.
Time Complexity From Wikipedia: The time complexity of an algorithm quantifies the amount of time taken by an algorithm to run as a function of the length of the input The time complexity of an algorithm is commonly expressed using big O notation, which excludes coefficients and lower order terms.
Although time complexity is a good indication of efficiency, in practical numerical computation sometimes constants are important: For example, time for running 1 billion operations: “exp”: 30.19 secs “*”: 1.84 secs “/”: 7.31 secs “+”: 1.77 secs
In this course we will ignore these constants
Big-O Definition of O(·): Let f and g be two functions, we write f (x) = O(g (x)) as x → ∞ if and only if there exists a positive constant M and x0 such that |f (x)| ≤ M|g (x)| for all x ≥ x0 In short, f (x) = O(g (x)) means f is upper bounded by g up to a constant factor
Big-O Definition of O(·): Let f and g be two functions, we write f (x) = O(g (x)) as x → ∞ if and only if there exists a positive constant M and x0 such that |f (x)| ≤ M|g (x)| for all x ≥ x0 In short, f (x) = O(g (x)) means f is upper bounded by g up to a constant factor How to show the time complexity O(g (x))? Show there exists a way to implement the algorithm and the implementation requires Cg (x) operations for some C
Big-Omega Definition of Ω(·): Let f and g be two functions, we write f (x) = Ω(g (x)) as x → ∞ if and only if there exists a positive constant m and x0 such that |f (x)| ≥ m|g (x)| for all x ≥ x0 In short, f (x) = Ω(g (x)) means f is lower bounded by g up to a constant factor
Big-Omega Definition of Ω(·): Let f and g be two functions, we write f (x) = Ω(g (x)) as x → ∞ if and only if there exists a positive constant m and x0 such that |f (x)| ≥ m|g (x)| for all x ≥ x0 In short, f (x) = Ω(g (x)) means f is lower bounded by g up to a constant factor How to show the time complexity Ω(g (x))? Prove any implementation requires at least Cg (x) operations for some constant C .
Big-Theta Definition of Θ(·): Let f and g be two functions, we write f (x) = Θ(g (x)) as x → ∞ if and only if there exists positive constant m, M and x0 such that M|g (x)| ≥ |f (x)| ≥ m|g (x)| for all x ≥ x0 In short, f (x) = Θ(g (x)) means f has the same order with g up to a constant factor
Big-Theta Definition of Θ(·): Let f and g be two functions, we write f (x) = Θ(g (x)) as x → ∞ if and only if there exists positive constant m, M and x0 such that M|g (x)| ≥ |f (x)| ≥ m|g (x)| for all x ≥ x0 In short, f (x) = Θ(g (x)) means f has the same order with g up to a constant factor How to show the time complexity Θ(g (x))? Show both Big-O and Big-Omega
Count number of operations Count the total number of operations (+, −, ∗, /, exp, log , if, . . . ) Only need to count the “order” of operations, and then use the big-O notation
Dense vector and sparse vector
If x, y ∈ Rm are dense: x + y, x − y, x T y: O(m) operations If x, y ∈ Rm , x is dense and y is sparse: x + y, x − y, x T y: O(nnz(y)) operations If x, y ∈ Rm and both of them are sparse: x + y, x − y, x T y: O(nnz(y) + nnz(x)) operations
Dense Matrix vs Sparse Matrix Any matrix X ∈ Rm×n can be dense or sparse Dense Matrix: most entries in X are nonzero (mn space) Sparse Matrix: only few entries in X are nonzero (O(nnz) space)
Dense Matrix Operations
Let A ∈ Rm×n , B ∈ Rm×n , s ∈ R A + B, sA, AT : O(mn) operations Let A ∈ Rm×n , b ∈ Rn×1 Ab: O(mn) operations
Dense Matrix Operations Matrix-matrix multiplication: let A ∈ Rm×k , B ∈ Rk×n , what is the time complexity of computing AB?
Dense Matrix Operations Assume A, B ∈ Rn×n , what is the time complexity of computing AB? Naive implementation: O(n3 ) Theoretical best: O(n2.xxx ) (but slower than naive implementation in practice) Best way: using BLAS (Basic Linear Algebra Subprograms)
Dense Matrix Operations BLAS matrix product: O(mnk) for computing AB where A ∈ Rm×k , B ∈ Rk×n Compute matrix product block by block to minimize cache miss rate Can be called from C, Fortran; can be used in MATLAB, R, Python, . . .
Sparse Matrix Operations Widely-used format: Compressed Sparse Column (CSC), Compressed Sparse Row (CSR), . . . CSR: three arrays for storing an m × n matrix with nnz nonzeroes 1 2 3
val (nnz real numbers): the values of each nonzero elements row ind (nnz integers): the column indices corresponding to the values col ptr (m + 1 integers): the list of value indexes where each column starts
Sparse Matrix Operations If A ∈ Rm×n (sparse), B ∈ Rm×n (sparse or dense), s ∈ R A + B, sA, AT : O(nnz) operations If A ∈ Rm×n , b ∈ Rn×1 Ab: O(nnz) operations If A ∈ Rm×k (sparse), B ∈ Rk×n (dense): AB: O((nnz)n) operations (use sparse BLAS) If A ∈ Rm×k (sparse), B ∈ Rk×n (sparse): AB: O(nnz(A)nnz(B)/k) in average AB: O(nnz(A)n) worst case The resulting matrix will be much denser