I encountered LeetCode problem 1307. Verbal Arithmetic Puzzle in Weekly Contest 169, and it was fun to figure out why my first submission is TLE.
Question
Given an equation, represented by words on left side and the result on right side.
You need to check if the equation is solvable under the following rules:
- Each character is decoded as one digit (0 - 9).
- Every pair of different characters they must map to different digits.
- Each words[i] and result are decoded as one number without leading zeros.
- Sum of numbers on left side (words) will equal to the number on right side (result).
Return True if the equation is solvable otherwise return False.
Example 1:
Input: words = ["SEND","MORE"], result = "MONEY"
Output: true
Explanation: Map 'S'-> 9, 'E'->5, 'N'->6, 'D'->7, 'M'->1, 'O'->0, 'R'->8, 'Y'->'2'
Such that: "SEND" + "MORE" = "MONEY" , 9567 + 1085 = 10652
Example 2:
Input: words = ["SIX","SEVEN","SEVEN"], result = "TWENTY"
Output: true
Explanation: Map 'S'-> 6, 'I'->5, 'X'->0, 'E'->8, 'V'->7, 'N'->2, 'T'->1, 'W'->'3', 'Y'->4
Such that: "SIX" + "SEVEN" + "SEVEN" = "TWENTY" , 650 + 68782 + 68782 = 138214
Example 3:
Input: words = ["THIS","IS","TOO"], result = "FUNNY"
Output: true
Example 4:
Input: words = ["LEET","CODE"], result = "POINT"
Output: false
Constraints:
- 2 <= words.length <= 5
- 1 <= words[i].length, results.length <= 7
- words[i], result contains only upper case English letters.
- Number of different characters used on the expression is at most 10.
My First Submission (TLE)
class Solution {
public:
set<char> c_set;
vector<char> c;
set<char> no_zeros;
map<char, int> l_factors;
map<char, int> r_factors;
set<int> used;
bool isSolvable(vector<string>& words, string result) {
for(int i = 0; i < words.size(); i++) {
int factor = 1;
for(int j = 0; j < words[i].size(); j++) {
char tmp = words[i][words[i].size() - j - 1];
if (j == words[i].size() - 1) {
no_zeros.insert(tmp);
}
c_set.insert(tmp);
if (l_factors.find(tmp) == l_factors.end()) {
l_factors[tmp] = factor;
} else {
l_factors[tmp] += factor;
}
factor *= 10;
}
}
for(int i = 0, factor = 1; i < result.size(); i++) {
char tmp = result[result.size() - i - 1];
if (i == result.size() - 1) {
no_zeros.insert(tmp);
}
c_set.insert(tmp);
if (r_factors.find(tmp) == r_factors.end()) {
r_factors[tmp] = factor;
} else {
r_factors[tmp] += factor;
}
factor *= 10;
}
for(auto i : c_set) {
c.push_back(i);
}
return dfs(0, 0, 0);
}
bool dfs(int r, int left, int right) {
if (r == c.size()) {
return left == right;
}
char tmp = c[r];
int l_factor = 0;
if (l_factors.find(tmp) != l_factors.end()) {
l_factor = l_factors[tmp];
}
int r_factor = 0;
if (r_factors.find(tmp) != r_factors.end()) {
r_factor = r_factors[tmp];
}
for(int i = 0; i < 10; i++) {
if (i == 0 && no_zeros.find(tmp) != no_zeros.end()) {
continue;
}
if (used.find(i) != used.end()) {
continue;
}
used.insert(i);
if (dfs(r + 1, left + i * l_factor, right + i * r_factor)) {
return true;
}
used.erase(i);
}
return false;
}
};
Set vs Array
The main problem of the above solution is the set<int> used
. As we will insert and erase elements from the set in O(n * 10!/(10-n)!)
times, which n is the unique number of chars of the input strings, the complexity of the insert and erase operations matters here. It’s interesting that changing the set
to an array
is enough to pass the test cases. The time complexity difference between set
and array
here should be constant as the size of the set is fixed (10 in this question). So the constant must be large so that LeetCode can have stable test cases to TLE my solution. So what’s the constant here?
Below is a poor man’s benchmark to figure out the constant:
#include <chrono>
#include <cstdlib>
#include <ctime>
#include <iostream>
#include <set>
using namespace std;
#define SIZE 10
#define PRE_FILL (SIZE / 2)
#define OP 1000000
set<int> test_set;
bool test_array[SIZE];
void set_find() {
int count = 0;
for(int i = 0; i < OP; i++) {
int j = std::rand() % SIZE;
if (test_set.find(j) != test_set.end()) {
count += 1;
}
}
}
void array_find() {
int count = 0;
for(int i = 0; i < OP; i++) {
int j = std::rand() % SIZE;
if (test_array[j]) {
count += 1;
}
}
}
void set_insert_and_delete() {
for(int i = 0; i < OP; i++) {
int j = std::rand() % SIZE;
test_set.insert(j);
test_set.erase(j);
}
}
void array_insert_and_delete() {
for(int i = 0; i < OP; i++) {
int j = std::rand() % SIZE;
test_array[j] = true;
test_array[j] = false;
}
}
void measure(string name, void (*func)()) {
auto start = std::chrono::high_resolution_clock::now();
func();
auto finish = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed = finish - start;
std::cout << name << ": " << elapsed.count() << " s\n";
}
int main(int argc, char *argv[]) {
std::srand(std::time(nullptr)); // use current time as seed for random generator
memset(test_array, 0, sizeof test_array);
for(int i = 0; i < PRE_FILL; i++) {
int j = std::rand() % SIZE;
test_set.insert(j);
test_array[j] = true;
}
measure("set_find", set_find);
measure("array_find", array_find);
measure("set_insert_and_delete", set_insert_and_delete);
measure("array_insert_and_delete", array_insert_and_delete);
}
Output(I was running on a MacBook Pro (Retina, 13-inch, Late 2012) with 2.5 GHz Intel Core i5):
set_find: 0.161238 s
array_find: 0.0205961 s
set_insert_and_delete: 0.515822 s
array_insert_and_delete: 0.0111742 s
The constant is around 50 (In this specific run, 0.515822 / 0.0111742 = 46
), which is larger than my expectation, as I assume it should be close to log(10)
as the underlying black red tree operation is log(N)
, but it turns out to be much more expensive than that. So it’s petty expensive to use a set
in this case.
Lesson Learned
The rules of thumb matter. If I was aware of the difference, I would try using array more for sure.
Here are some related articles I found during searching for the topic:
- Verbal Arithmetic and Mastermind (2016)
- C++ Set Performance by Tino Didriksen
- Why you shouldn’t use set (and what you should use instead) by Matt Austern
Talking about rules of thumb, below are the latency numbers Jeff Dean thought every programmer should know:
Latency Comparison Numbers (~2012)
----------------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 3,000 ns 3 us
Send 1K bytes over 1 Gbps network 10,000 ns 10 us
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD
Read 1 MB sequentially from memory 250,000 ns 250 us
Round trip within same datacenter 500,000 ns 500 us
Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory
Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip
Read 1 MB sequentially from disk 20,000,000 ns 20,000 us 20 ms 80x memory, 20X SSD
Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms
Notes
-----
1 ns = 10^-9 seconds
1 us = 10^-6 seconds = 1,000 ns (it's called one microsecond.)
1 ms = 10^-3 seconds = 1,000 us = 1,000,000 ns (it's called one millisecond.)
Credit
------
By Jeff Dean: http://research.google.com/people/jeff/
Originally by Peter Norvig: http://norvig.com/21-days.html#answers
Contributions
-------------
'Humanized' comparison: https://gist.github.com/hellerbarde/2843375
Visual comparison chart: http://i.imgur.com/k0t1e.png