J'essayais de résoudre cet exercice depuis www.spoj.com: FCTRL - Factorial
Vous n'avez pas vraiment besoin de le lire, faites-le simplement si vous êtes curieux :)
Je l'ai d'abord implémenté en C ++ (voici ma solution):
#include <iostream>
using namespace std;
int main() {
unsigned int num_of_inputs;
unsigned int fact_num;
unsigned int num_of_trailing_zeros;
std::ios_base::sync_with_stdio(false); // turn off synchronization with the C library’s stdio buffers (from https://stackoverflow.com/a/22225421/5218277)
cin >> num_of_inputs;
while (num_of_inputs--)
{
cin >> fact_num;
num_of_trailing_zeros = 0;
for (unsigned int fives = 5; fives <= fact_num; fives *= 5)
num_of_trailing_zeros += fact_num/fives;
cout << num_of_trailing_zeros << "\n";
}
return 0;
}
I uploaded it as the solution for g++ 5.1
The result was: Time 0.18 Mem 3.3M
But then I saw some comments which claimed that their time execution was less than 0.1. Since I couldn't think about faster algorithm I tried to implement the same code in C:
#include <stdio.h>
int main() {
unsigned int num_of_inputs;
unsigned int fact_num;
unsigned int num_of_trailing_zeros;
scanf("%d", &num_of_inputs);
while (num_of_inputs--)
{
scanf("%d", &fact_num);
num_of_trailing_zeros = 0;
for (unsigned int fives = 5; fives <= fact_num; fives *= 5)
num_of_trailing_zeros += fact_num/fives;
printf("%d", num_of_trailing_zeros);
printf("%s","\n");
}
return 0;
}
I uploaded it as the solution for gcc 5.1
This time the result was: Time 0.02 Mem 2.1M
Now the code is almost the same, I added std::ios_base::sync_with_stdio(false);
to the C++ code as was suggested here to turn off the synchronization with the C library’s stdio buffers. I also split the printf("%d\n", num_of_trailing_zeros);
to printf("%d", num_of_trailing_zeros); printf("%s","\n");
to compensate for double call of operator<<
in cout << num_of_trailing_zeros << "\n";
.
But I still saw x9 better performance and lower memory usage in C vs. C++ code.
Why is that?
EDIT
I fixed unsigned long
to unsigned int
in the C code. It should have been unsigned int
and the results which are shown above are related to the new (unsigned int
) version.
std::ostringstream
to accumulate the output and sending it tostd::cout
all at once at the end brings the time down to 0.02. Usingstd::cout
in a loop is simply slower in their environment and I don't think there's a simple way to improve it.Réponses:
Both programs do exactly the same thing. They use the same exact algorithm, and given its low complexity, their performance is mostly bound to efficiency of the input and output handling.
scanning the input with
scanf("%d", &fact_num);
on one side andcin >> fact_num;
on the other does not seem very costly either way. In fact it should be less costly in C++ since the type of conversion is known at compile time and the correct parser can be invoked directly by the C++ compiler. The same holds for the output. You even make a point of writing a separate call forprintf("%s","\n");
, but the C compiler is good enough to compile this as a call toputchar('\n');
.So looking at the complexity of both the I/O and computation, the C++ version should be faster than the C version.
Completely disabling the buffering of
stdout
slows the C implementation to something even slower than the C++ version. Another test by AlexLop with anfflush(stdout);
after the lastprintf
yields similar performance as the C++ version. It is not as slow as completely disabling buffering because output is written to the system in small chunks instead of one byte at a time.This seems to point to a specific behavior in your C++ library: I suspect your system's implementation of
cin
andcout
flushes the output tocout
when input is requested fromcin
. Some C libraries do this as well, but usually only when reading/writing to and from the terminal. The benchmarking done by the www.spoj.com site probably redirects input and output to and from files.AlexLop did another test: reading all the inputs at once in a vector and subsequently computing and writing all output helps understanding why the C++ version is so much slower. It increases performance to that of the C version, this proves my point and removes suspicion on the C++ formatting code.
Another test by Blastfurnace, storing all outputs in an
std::ostringstream
and flushing that in one blast at the end, does improve the C++ performance to that of the basic C version. QED.PS: your algorithm is incorrect for
fact_num >= UINT_MAX / 5
becausefives *= 5
will overflow and wrap around before it becomes> fact_num
. You can correct this by makingfives
anunsigned long
or anunsigned long long
if one of these types is larger thanunsigned int
. Also use%u
as thescanf
format. You are lucky the guys at www.spoj.com are not too strict in their benchmarks.EDIT: As later explained by vitaux, this behavior is indeed mandated by the C++ standard.
cin
is tied tocout
by default. An input operation fromcin
for which the input buffer needs refilling will causecout
to flush pending output. In the OP's implementation,cin
seems to flushcout
systematically, which is a bit overkill and visibly inefficient.Ilya Popov provided a simple solution for this:
cin
can be untied fromcout
by casting another magical spell in addition tostd::ios_base::sync_with_stdio(false);
:Also note that such forced flush also occurs when using
std::endl
instead of'\n'
to produce an end of line oncout
. Changing the output line to the more C++ idiomatic and innocent lookingcout << num_of_trailing_zeros << endl;
would degrade performance the same way.la source
std::ostringstream
and outputting it all once at the end brings the time down to parity with the C version.ostringstream
for the output and it gave Time 0.02 QED :). Regarding thefact_num >= UINT_MAX / 5
, GOOD point!vector
and then processing the calculations (withoutostringstream
) gives the same result! Time 0.02. Combining bothvector
andostringstream
doesn't improve it more. Same Time 0.02sizeof(int) == sizeof(long long)
is this: add a test in the body of the loop afternum_of_trailing_zeros += fact_num/fives;
to check iffives
has reached its maximum:if (fives > UINT_MAX / 5) break;
Another trick to make
iostream
s faster when you use bothcin
andcout
is to callcin.tie(nullptr);
By default, when you input anything from
cin
, it flushescout
. It can significantly harm performance if you do interleaved input and output. This is done for the command line interface uses, where you show some prompt and then wait for data:std::string name; cout << "Enter your name:"; cin >> name;
In this case you want to make sure the prompt is actually shown before you start waiting for input. With the line above you break that tie,
cin
andcout
become independent.Since C++11, one more way to achieve better performance with iostreams is to use
std::getline
together withstd::stoi
, like this:std::string line; for (int i = 0; i < n && std::getline(std::cin, line); ++i) { int x = std::stoi(line); }
This way can come close to C-style in performance, or even surpass
scanf
. Usinggetchar
and especiallygetchar_unlocked
together with handwritten parsing still provides better performance.PS. I have written a post comparing several ways to input numbers in C++, useful for online judges, but it is only in Russian, sorry. The code samples and the final table, however, should be understandable.
la source
std::readline
andstd::stoi
is not functionally equivalent to the OPs code. Bothcin >> x;
andscanf("%f", &x);
accept ant whitespace as separator, there can be multiple numbers on the same line.The problem is that, quoting cppreference:
This is easy to test: if you replace
cin >> fact_num;
with
scanf("%d", &fact_num);
and same for
cin >> num_of_inputs
but keepcout
you'll get pretty much the same performance in your C++ version (or, rather IOStream version) as in C one:The same happens if you keep
cin
but replacecout << num_of_trailing_zeros << "\n";
with
printf("%d", num_of_trailing_zeros); printf("%s","\n");
A simple solution is to untie
cout
andcin
as mentioned by Ilya Popov:cin.tie(nullptr);
Standard library implementations are allowed to omit the call to flush in certain cases, but not always. Here's a quote from C++14 27.7.2.1.3 (thanks to chqrlie):
la source
is.tie()
is not a null pointer, the function callsis.tie()->flush()
to synchronize the output sequence with any associated external C stream. Except that this call can be suppressed if the put area ofis.tie()
is empty. Further an implementation is allowed to defer the call to flush until a call ofis.rdbuf()->underflow()
occurs. If no such call occurs before the sentry object is destroyed, the call to flush may be eliminated entirely.