This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA

Download & View **Big O Notations** as PDF for free.

**Words:**1,058**Pages:**19

Loading documents preview...

ECS289: Scalable Machine Learning Big-O Notations Cho-Jui Hsieh UC Davis

Oct 20, 2015

Outline Time complexity and Big-O notations Time complexity for basic linear algebra operators

Time Complexity From Wikipedia: The time complexity of an algorithm quantifies the amount of time taken by an algorithm to run as a function of the length of the input The time complexity of an algorithm is commonly expressed using big O notation, which excludes coefficients and lower order terms.

Time Complexity From Wikipedia: The time complexity of an algorithm quantifies the amount of time taken by an algorithm to run as a function of the length of the input The time complexity of an algorithm is commonly expressed using big O notation, which excludes coefficients and lower order terms.

Although time complexity is a good indication of efficiency, in practical numerical computation sometimes constants are important: For example, time for running 1 billion operations: “exp”: 30.19 secs “*”: 1.84 secs “/”: 7.31 secs “+”: 1.77 secs

In this course we will ignore these constants

Big-O Definition of O(·): Let f and g be two functions, we write f (x) = O(g (x)) as x → ∞ if and only if there exists a positive constant M and x0 such that |f (x)| ≤ M|g (x)| for all x ≥ x0 In short, f (x) = O(g (x)) means f is upper bounded by g up to a constant factor

Big-O Definition of O(·): Let f and g be two functions, we write f (x) = O(g (x)) as x → ∞ if and only if there exists a positive constant M and x0 such that |f (x)| ≤ M|g (x)| for all x ≥ x0 In short, f (x) = O(g (x)) means f is upper bounded by g up to a constant factor How to show the time complexity O(g (x))? Show there exists a way to implement the algorithm and the implementation requires Cg (x) operations for some C

Big-Omega Definition of Ω(·): Let f and g be two functions, we write f (x) = Ω(g (x)) as x → ∞ if and only if there exists a positive constant m and x0 such that |f (x)| ≥ m|g (x)| for all x ≥ x0 In short, f (x) = Ω(g (x)) means f is lower bounded by g up to a constant factor

Big-Omega Definition of Ω(·): Let f and g be two functions, we write f (x) = Ω(g (x)) as x → ∞ if and only if there exists a positive constant m and x0 such that |f (x)| ≥ m|g (x)| for all x ≥ x0 In short, f (x) = Ω(g (x)) means f is lower bounded by g up to a constant factor How to show the time complexity Ω(g (x))? Prove any implementation requires at least Cg (x) operations for some constant C .

Big-Theta Definition of Θ(·): Let f and g be two functions, we write f (x) = Θ(g (x)) as x → ∞ if and only if there exists positive constant m, M and x0 such that M|g (x)| ≥ |f (x)| ≥ m|g (x)| for all x ≥ x0 In short, f (x) = Θ(g (x)) means f has the same order with g up to a constant factor

Big-Theta Definition of Θ(·): Let f and g be two functions, we write f (x) = Θ(g (x)) as x → ∞ if and only if there exists positive constant m, M and x0 such that M|g (x)| ≥ |f (x)| ≥ m|g (x)| for all x ≥ x0 In short, f (x) = Θ(g (x)) means f has the same order with g up to a constant factor How to show the time complexity Θ(g (x))? Show both Big-O and Big-Omega

Count number of operations Count the total number of operations (+, −, ∗, /, exp, log , if, . . . ) Only need to count the “order” of operations, and then use the big-O notation

Dense vector and sparse vector

If x, y ∈ Rm are dense: x + y, x − y, x T y: O(m) operations If x, y ∈ Rm , x is dense and y is sparse: x + y, x − y, x T y: O(nnz(y)) operations If x, y ∈ Rm and both of them are sparse: x + y, x − y, x T y: O(nnz(y) + nnz(x)) operations

Dense Matrix vs Sparse Matrix Any matrix X ∈ Rm×n can be dense or sparse Dense Matrix: most entries in X are nonzero (mn space) Sparse Matrix: only few entries in X are nonzero (O(nnz) space)

Dense Matrix Operations

Let A ∈ Rm×n , B ∈ Rm×n , s ∈ R A + B, sA, AT : O(mn) operations Let A ∈ Rm×n , b ∈ Rn×1 Ab: O(mn) operations

Dense Matrix Operations Matrix-matrix multiplication: let A ∈ Rm×k , B ∈ Rk×n , what is the time complexity of computing AB?

Dense Matrix Operations Assume A, B ∈ Rn×n , what is the time complexity of computing AB? Naive implementation: O(n3 ) Theoretical best: O(n2.xxx ) (but slower than naive implementation in practice) Best way: using BLAS (Basic Linear Algebra Subprograms)

Dense Matrix Operations BLAS matrix product: O(mnk) for computing AB where A ∈ Rm×k , B ∈ Rk×n Compute matrix product block by block to minimize cache miss rate Can be called from C, Fortran; can be used in MATLAB, R, Python, . . .

Sparse Matrix Operations Widely-used format: Compressed Sparse Column (CSC), Compressed Sparse Row (CSR), . . . CSR: three arrays for storing an m × n matrix with nnz nonzeroes 1 2 3

val (nnz real numbers): the values of each nonzero elements row ind (nnz integers): the column indices corresponding to the values col ptr (m + 1 integers): the list of value indexes where each column starts

Sparse Matrix Operations If A ∈ Rm×n (sparse), B ∈ Rm×n (sparse or dense), s ∈ R A + B, sA, AT : O(nnz) operations If A ∈ Rm×n , b ∈ Rn×1 Ab: O(nnz) operations If A ∈ Rm×k (sparse), B ∈ Rk×n (dense): AB: O((nnz)n) operations (use sparse BLAS) If A ∈ Rm×k (sparse), B ∈ Rk×n (sparse): AB: O(nnz(A)nnz(B)/k) in average AB: O(nnz(A)n) worst case The resulting matrix will be much denser

Oct 20, 2015

Outline Time complexity and Big-O notations Time complexity for basic linear algebra operators

Time Complexity From Wikipedia: The time complexity of an algorithm quantifies the amount of time taken by an algorithm to run as a function of the length of the input The time complexity of an algorithm is commonly expressed using big O notation, which excludes coefficients and lower order terms.

Time Complexity From Wikipedia: The time complexity of an algorithm quantifies the amount of time taken by an algorithm to run as a function of the length of the input The time complexity of an algorithm is commonly expressed using big O notation, which excludes coefficients and lower order terms.

Although time complexity is a good indication of efficiency, in practical numerical computation sometimes constants are important: For example, time for running 1 billion operations: “exp”: 30.19 secs “*”: 1.84 secs “/”: 7.31 secs “+”: 1.77 secs

In this course we will ignore these constants

Big-O Definition of O(·): Let f and g be two functions, we write f (x) = O(g (x)) as x → ∞ if and only if there exists a positive constant M and x0 such that |f (x)| ≤ M|g (x)| for all x ≥ x0 In short, f (x) = O(g (x)) means f is upper bounded by g up to a constant factor

Big-O Definition of O(·): Let f and g be two functions, we write f (x) = O(g (x)) as x → ∞ if and only if there exists a positive constant M and x0 such that |f (x)| ≤ M|g (x)| for all x ≥ x0 In short, f (x) = O(g (x)) means f is upper bounded by g up to a constant factor How to show the time complexity O(g (x))? Show there exists a way to implement the algorithm and the implementation requires Cg (x) operations for some C

Big-Omega Definition of Ω(·): Let f and g be two functions, we write f (x) = Ω(g (x)) as x → ∞ if and only if there exists a positive constant m and x0 such that |f (x)| ≥ m|g (x)| for all x ≥ x0 In short, f (x) = Ω(g (x)) means f is lower bounded by g up to a constant factor

Big-Omega Definition of Ω(·): Let f and g be two functions, we write f (x) = Ω(g (x)) as x → ∞ if and only if there exists a positive constant m and x0 such that |f (x)| ≥ m|g (x)| for all x ≥ x0 In short, f (x) = Ω(g (x)) means f is lower bounded by g up to a constant factor How to show the time complexity Ω(g (x))? Prove any implementation requires at least Cg (x) operations for some constant C .

Big-Theta Definition of Θ(·): Let f and g be two functions, we write f (x) = Θ(g (x)) as x → ∞ if and only if there exists positive constant m, M and x0 such that M|g (x)| ≥ |f (x)| ≥ m|g (x)| for all x ≥ x0 In short, f (x) = Θ(g (x)) means f has the same order with g up to a constant factor

Big-Theta Definition of Θ(·): Let f and g be two functions, we write f (x) = Θ(g (x)) as x → ∞ if and only if there exists positive constant m, M and x0 such that M|g (x)| ≥ |f (x)| ≥ m|g (x)| for all x ≥ x0 In short, f (x) = Θ(g (x)) means f has the same order with g up to a constant factor How to show the time complexity Θ(g (x))? Show both Big-O and Big-Omega

Count number of operations Count the total number of operations (+, −, ∗, /, exp, log , if, . . . ) Only need to count the “order” of operations, and then use the big-O notation

Dense vector and sparse vector

If x, y ∈ Rm are dense: x + y, x − y, x T y: O(m) operations If x, y ∈ Rm , x is dense and y is sparse: x + y, x − y, x T y: O(nnz(y)) operations If x, y ∈ Rm and both of them are sparse: x + y, x − y, x T y: O(nnz(y) + nnz(x)) operations

Dense Matrix vs Sparse Matrix Any matrix X ∈ Rm×n can be dense or sparse Dense Matrix: most entries in X are nonzero (mn space) Sparse Matrix: only few entries in X are nonzero (O(nnz) space)

Dense Matrix Operations

Let A ∈ Rm×n , B ∈ Rm×n , s ∈ R A + B, sA, AT : O(mn) operations Let A ∈ Rm×n , b ∈ Rn×1 Ab: O(mn) operations

Dense Matrix Operations Matrix-matrix multiplication: let A ∈ Rm×k , B ∈ Rk×n , what is the time complexity of computing AB?

Dense Matrix Operations Assume A, B ∈ Rn×n , what is the time complexity of computing AB? Naive implementation: O(n3 ) Theoretical best: O(n2.xxx ) (but slower than naive implementation in practice) Best way: using BLAS (Basic Linear Algebra Subprograms)

Dense Matrix Operations BLAS matrix product: O(mnk) for computing AB where A ∈ Rm×k , B ∈ Rk×n Compute matrix product block by block to minimize cache miss rate Can be called from C, Fortran; can be used in MATLAB, R, Python, . . .

Sparse Matrix Operations Widely-used format: Compressed Sparse Column (CSC), Compressed Sparse Row (CSR), . . . CSR: three arrays for storing an m × n matrix with nnz nonzeroes 1 2 3

val (nnz real numbers): the values of each nonzero elements row ind (nnz integers): the column indices corresponding to the values col ptr (m + 1 integers): the list of value indexes where each column starts

Sparse Matrix Operations If A ∈ Rm×n (sparse), B ∈ Rm×n (sparse or dense), s ∈ R A + B, sA, AT : O(nnz) operations If A ∈ Rm×n , b ∈ Rn×1 Ab: O(nnz) operations If A ∈ Rm×k (sparse), B ∈ Rk×n (dense): AB: O((nnz)n) operations (use sparse BLAS) If A ∈ Rm×k (sparse), B ∈ Rk×n (sparse): AB: O(nnz(A)nnz(B)/k) in average AB: O(nnz(A)n) worst case The resulting matrix will be much denser