Assume that we have an application with a total of 500,000 instructions where 20% of them are the load/store instructions with an average CPI of 6 cycles, and the rest instructions are ALU instructions with average CPI of 1 cycle. If we double the clock rate without optimizing the memory latency, the average CPI for load/store instruction will also be doubled to 12 cycles. What is the speedup after this change?