Grande différence (x9) dans le temps d'exécution entre du code presque identique en C et C ++

Question 1

J'essayais de résoudre cet exercice depuis www.spoj.com: FCTRL - Factorial

Vous n'avez pas vraiment besoin de le lire, faites-le simplement si vous êtes curieux :)

Je l'ai d'abord implémenté en C ++ (voici ma solution):

#include <iostream>
using namespace std;

int main() {
    unsigned int num_of_inputs;
    unsigned int fact_num;
    unsigned int num_of_trailing_zeros;

    std::ios_base::sync_with_stdio(false); // turn off synchronization with the C library’s stdio buffers (from https://stackoverflow.com/a/22225421/5218277)

    cin >> num_of_inputs;

    while (num_of_inputs--)
    {
        cin >> fact_num;

        num_of_trailing_zeros = 0;

        for (unsigned int fives = 5; fives <= fact_num; fives *= 5)
            num_of_trailing_zeros += fact_num/fives;

        cout << num_of_trailing_zeros << "\n";
    }

    return 0;
}

I uploaded it as the solution for g++ 5.1

The result was: Time 0.18 Mem 3.3M

But then I saw some comments which claimed that their time execution was less than 0.1. Since I couldn't think about faster algorithm I tried to implement the same code in C:

#include <stdio.h>

int main() {
    unsigned int num_of_inputs;
    unsigned int fact_num;
    unsigned int num_of_trailing_zeros;

    scanf("%d", &num_of_inputs);

    while (num_of_inputs--)
    {
        scanf("%d", &fact_num);

        num_of_trailing_zeros = 0;

        for (unsigned int fives = 5; fives <= fact_num; fives *= 5)
            num_of_trailing_zeros += fact_num/fives;

        printf("%d", num_of_trailing_zeros);
        printf("%s","\n");
    }

    return 0;
}

I uploaded it as the solution for gcc 5.1

This time the result was: Time 0.02 Mem 2.1M

Now the code is almost the same, I added std::ios_base::sync_with_stdio(false); to the C++ code as was suggested here to turn off the synchronization with the C library’s stdio buffers. I also split the printf("%d\n", num_of_trailing_zeros); to printf("%d", num_of_trailing_zeros); printf("%s","\n"); to compensate for double call of operator<< in cout << num_of_trailing_zeros << "\n";.

But I still saw x9 better performance and lower memory usage in C vs. C++ code.

Why is that?

EDIT

I fixed unsigned long to unsigned int in the C code. It should have been unsigned int and the results which are shown above are related to the new (unsigned int) version.

Question 2

Both programs do exactly the same thing. They use the same exact algorithm, and given its low complexity, their performance is mostly bound to efficiency of the input and output handling.

scanning the input with scanf("%d", &fact_num); on one side and cin >> fact_num; on the other does not seem very costly either way. In fact it should be less costly in C++ since the type of conversion is known at compile time and the correct parser can be invoked directly by the C++ compiler. The same holds for the output. You even make a point of writing a separate call for printf("%s","\n");, but the C compiler is good enough to compile this as a call to putchar('\n');.

So looking at the complexity of both the I/O and computation, the C++ version should be faster than the C version.

Completely disabling the buffering of stdout slows the C implementation to something even slower than the C++ version. Another test by AlexLop with an fflush(stdout); after the last printf yields similar performance as the C++ version. It is not as slow as completely disabling buffering because output is written to the system in small chunks instead of one byte at a time.

This seems to point to a specific behavior in your C++ library: I suspect your system's implementation of cin and cout flushes the output to cout when input is requested from cin. Some C libraries do this as well, but usually only when reading/writing to and from the terminal. The benchmarking done by the www.spoj.com site probably redirects input and output to and from files.

AlexLop did another test: reading all the inputs at once in a vector and subsequently computing and writing all output helps understanding why the C++ version is so much slower. It increases performance to that of the C version, this proves my point and removes suspicion on the C++ formatting code.

Another test by Blastfurnace, storing all outputs in an std::ostringstream and flushing that in one blast at the end, does improve the C++ performance to that of the basic C version. QED.

Interlacing input from cin and output to cout seems to cause very inefficient I/O handling, defeating the stream buffering scheme. reducing performance by a factor of 10.

PS: your algorithm is incorrect for fact_num >= UINT_MAX / 5 because fives *= 5 will overflow and wrap around before it becomes > fact_num. You can correct this by making fives an unsigned long or an unsigned long long if one of these types is larger than unsigned int. Also use %u as the scanf format. You are lucky the guys at www.spoj.com are not too strict in their benchmarks.

EDIT: As later explained by vitaux, this behavior is indeed mandated by the C++ standard. cin is tied to cout by default. An input operation from cin for which the input buffer needs refilling will cause cout to flush pending output. In the OP's implementation, cin seems to flush cout systematically, which is a bit overkill and visibly inefficient.

Ilya Popov provided a simple solution for this: cin can be untied from cout by casting another magical spell in addition to std::ios_base::sync_with_stdio(false);:

cin.tie(nullptr);

Also note that such forced flush also occurs when using std::endl instead of '\n' to produce an end of line on cout. Changing the output line to the more C++ idiomatic and innocent looking cout << num_of_trailing_zeros << endl; would degrade performance the same way.

Question 3

Another trick to make iostreams faster when you use both cin and cout is to call

cin.tie(nullptr);

By default, when you input anything from cin, it flushes cout. It can significantly harm performance if you do interleaved input and output. This is done for the command line interface uses, where you show some prompt and then wait for data:

std::string name;
cout << "Enter your name:";
cin >> name;

In this case you want to make sure the prompt is actually shown before you start waiting for input. With the line above you break that tie, cin and cout become independent.

Since C++11, one more way to achieve better performance with iostreams is to use std::getline together with std::stoi, like this:

std::string line;
for (int i = 0; i < n && std::getline(std::cin, line); ++i)
{
    int x = std::stoi(line);
}

This way can come close to C-style in performance, or even surpass scanf. Using getchar and especially getchar_unlocked together with handwritten parsing still provides better performance.

PS. I have written a post comparing several ways to input numbers in C++, useful for online judges, but it is only in Russian, sorry. The code samples and the final table, however, should be understandable.

Question 4

The problem is that, quoting cppreference:

any input from std::cin, output to std::cerr, or program termination forces a call to std::cout.flush()

This is easy to test: if you replace

cin >> fact_num;

with

scanf("%d", &fact_num);

and same for cin >> num_of_inputs but keep cout you'll get pretty much the same performance in your C++ version (or, rather IOStream version) as in C one:

The same happens if you keep cin but replace

cout << num_of_trailing_zeros << "\n";

with

printf("%d", num_of_trailing_zeros);
printf("%s","\n");

A simple solution is to untie cout and cin as mentioned by Ilya Popov:

cin.tie(nullptr);

Standard library implementations are allowed to omit the call to flush in certain cases, but not always. Here's a quote from C++14 27.7.2.1.3 (thanks to chqrlie):

Class basic_istream::sentry : First, if is.tie() is not a null pointer, the function calls is.tie()->flush() to synchronize the output sequence with any associated external C stream. Except that this call can be suppressed if the put area of is.tie() is empty. Further an implementation is allowed to defer the call to flush until a call of is.rdbuf()->underflow() occurs. If no such call occurs before the sentry object is destroyed, the call to flush may be eliminated entirely.

Answer 1

J'essayais de résoudre cet exercice depuis www.spoj.com: FCTRL - Factorial

Vous n'avez pas vraiment besoin de le lire, faites-le simplement si vous êtes curieux :)

Je l'ai d'abord implémenté en C ++ (voici ma solution):

#include <iostream>
using namespace std;

int main() {
    unsigned int num_of_inputs;
    unsigned int fact_num;
    unsigned int num_of_trailing_zeros;

    std::ios_base::sync_with_stdio(false); // turn off synchronization with the C library’s stdio buffers (from https://stackoverflow.com/a/22225421/5218277)

    cin >> num_of_inputs;

    while (num_of_inputs--)
    {
        cin >> fact_num;

        num_of_trailing_zeros = 0;

        for (unsigned int fives = 5; fives <= fact_num; fives *= 5)
            num_of_trailing_zeros += fact_num/fives;

        cout << num_of_trailing_zeros << "\n";
    }

    return 0;
}

I uploaded it as the solution for g++ 5.1

The result was: Time 0.18 Mem 3.3M

But then I saw some comments which claimed that their time execution was less than 0.1. Since I couldn't think about faster algorithm I tried to implement the same code in C:

#include <stdio.h>

int main() {
    unsigned int num_of_inputs;
    unsigned int fact_num;
    unsigned int num_of_trailing_zeros;

    scanf("%d", &num_of_inputs);

    while (num_of_inputs--)
    {
        scanf("%d", &fact_num);

        num_of_trailing_zeros = 0;

        for (unsigned int fives = 5; fives <= fact_num; fives *= 5)
            num_of_trailing_zeros += fact_num/fives;

        printf("%d", num_of_trailing_zeros);
        printf("%s","\n");
    }

    return 0;
}

I uploaded it as the solution for gcc 5.1

This time the result was: Time 0.02 Mem 2.1M

Now the code is almost the same, I added std::ios_base::sync_with_stdio(false); to the C++ code as was suggested here to turn off the synchronization with the C library’s stdio buffers. I also split the printf("%d\n", num_of_trailing_zeros); to printf("%d", num_of_trailing_zeros); printf("%s","\n"); to compensate for double call of operator<< in cout << num_of_trailing_zeros << "\n";.

But I still saw x9 better performance and lower memory usage in C vs. C++ code.

Why is that?

EDIT

I fixed unsigned long to unsigned int in the C code. It should have been unsigned int and the results which are shown above are related to the new (unsigned int) version.

Answer 2

31

C++ streams are extremely slow by design. Because slow and steady wins the race. :P (runs before I get flamed)

Mysticial

Answer 3

7

The slowness doesn't come from safety or adaptability. It's way overdesigned with all the stream flags.

Karoly Horvath

Answer 4

8

@AlexLop. Using a std::ostringstream to accumulate the output and sending it to std::cout all at once at the end brings the time down to 0.02. Using std::cout in a loop is simply slower in their environment and I don't think there's a simple way to improve it.

Blastfurnace

Answer 5

6

Is noone else concerned by the fact that these timings were obtained using ideone?

ildjarn

Answer 6

6

@Olaf: I'm afraid I disagree, this kind of question is very much on topic for all the chosen tags. C and C++ are sufficiently close in general that such a difference in performance begs for an explanation. I'm glad we found it. Maybe the GNU libc++ should be improved as a consequence.

chqrlie

Answer 7

Both programs do exactly the same thing. They use the same exact algorithm, and given its low complexity, their performance is mostly bound to efficiency of the input and output handling.

scanning the input with scanf("%d", &fact_num); on one side and cin >> fact_num; on the other does not seem very costly either way. In fact it should be less costly in C++ since the type of conversion is known at compile time and the correct parser can be invoked directly by the C++ compiler. The same holds for the output. You even make a point of writing a separate call for printf("%s","\n");, but the C compiler is good enough to compile this as a call to putchar('\n');.

So looking at the complexity of both the I/O and computation, the C++ version should be faster than the C version.

Completely disabling the buffering of stdout slows the C implementation to something even slower than the C++ version. Another test by AlexLop with an fflush(stdout); after the last printf yields similar performance as the C++ version. It is not as slow as completely disabling buffering because output is written to the system in small chunks instead of one byte at a time.

This seems to point to a specific behavior in your C++ library: I suspect your system's implementation of cin and cout flushes the output to cout when input is requested from cin. Some C libraries do this as well, but usually only when reading/writing to and from the terminal. The benchmarking done by the www.spoj.com site probably redirects input and output to and from files.

AlexLop did another test: reading all the inputs at once in a vector and subsequently computing and writing all output helps understanding why the C++ version is so much slower. It increases performance to that of the C version, this proves my point and removes suspicion on the C++ formatting code.

Another test by Blastfurnace, storing all outputs in an std::ostringstream and flushing that in one blast at the end, does improve the C++ performance to that of the basic C version. QED.

Interlacing input from cin and output to cout seems to cause very inefficient I/O handling, defeating the stream buffering scheme. reducing performance by a factor of 10.

PS: your algorithm is incorrect for fact_num >= UINT_MAX / 5 because fives *= 5 will overflow and wrap around before it becomes > fact_num. You can correct this by making fives an unsigned long or an unsigned long long if one of these types is larger than unsigned int. Also use %u as the scanf format. You are lucky the guys at www.spoj.com are not too strict in their benchmarks.

EDIT: As later explained by vitaux, this behavior is indeed mandated by the C++ standard. cin is tied to cout by default. An input operation from cin for which the input buffer needs refilling will cause cout to flush pending output. In the OP's implementation, cin seems to flush cout systematically, which is a bit overkill and visibly inefficient.

Ilya Popov provided a simple solution for this: cin can be untied from cout by casting another magical spell in addition to std::ios_base::sync_with_stdio(false);:

cin.tie(nullptr);

Also note that such forced flush also occurs when using std::endl instead of '\n' to produce an end of line on cout. Changing the output line to the more C++ idiomatic and innocent looking cout << num_of_trailing_zeros << endl; would degrade performance the same way.

Answer 8

2

You're probably right about stream flushing. Collecting the output in a std::ostringstream and outputting it all once at the end brings the time down to parity with the C version.

Blastfurnace

Answer 9

2

@ DavidC.Rankin: I ventured a conjecture (cout is flushed upon reading cin), devised a way to prove it, AlexLop implemented it and it does give convincing evidence, but Blastfurnace came up with a different way to prove my point and his tests give equally convincing evidence. I take it for proof, but of course it is not a completely formal proof, looking at the C++ source code could.

chqrlie

Answer 10

2

I tried using ostringstream for the output and it gave Time 0.02 QED :). Regarding the fact_num >= UINT_MAX / 5, GOOD point!

Alex Lop.

Answer 11

1

Collecting all the inputs into a vector and then processing the calculations (without ostringstream) gives the same result! Time 0.02. Combining both vector and ostringstream doesn't improve it more. Same Time 0.02

Alex Lop.

Answer 12

2

A simpler fix that works even if sizeof(int) == sizeof(long long) is this: add a test in the body of the loop after num_of_trailing_zeros += fact_num/fives; to check if fives has reached its maximum: if (fives > UINT_MAX / 5) break;

chqrlie

Answer 13

Another trick to make iostreams faster when you use both cin and cout is to call

cin.tie(nullptr);

By default, when you input anything from cin, it flushes cout. It can significantly harm performance if you do interleaved input and output. This is done for the command line interface uses, where you show some prompt and then wait for data:

std::string name;
cout << "Enter your name:";
cin >> name;

In this case you want to make sure the prompt is actually shown before you start waiting for input. With the line above you break that tie, cin and cout become independent.

Since C++11, one more way to achieve better performance with iostreams is to use std::getline together with std::stoi, like this:

std::string line;
for (int i = 0; i < n && std::getline(std::cin, line); ++i)
{
    int x = std::stoi(line);
}

This way can come close to C-style in performance, or even surpass scanf. Using getchar and especially getchar_unlocked together with handwritten parsing still provides better performance.

PS. I have written a post comparing several ways to input numbers in C++, useful for online judges, but it is only in Russian, sorry. The code samples and the final table, however, should be understandable.

Answer 14

1

Thank you for the explanation and +1 for the solution, but your proposed alternative with std::readline and std::stoi is not functionally equivalent to the OPs code. Both cin >> x; and scanf("%f", &x); accept ant whitespace as separator, there can be multiple numbers on the same line.

chqrlie

Answer 15

27

The problem is that, quoting cppreference:

any input from std::cin, output to std::cerr, or program termination forces a call to std::cout.flush()

This is easy to test: if you replace

cin >> fact_num;

with

scanf("%d", &fact_num);

and same for cin >> num_of_inputs but keep cout you'll get pretty much the same performance in your C++ version (or, rather IOStream version) as in C one:

The same happens if you keep cin but replace

cout << num_of_trailing_zeros << "\n";

with

printf("%d", num_of_trailing_zeros);
printf("%s","\n");

A simple solution is to untie cout and cin as mentioned by Ilya Popov:

cin.tie(nullptr);

Standard library implementations are allowed to omit the call to flush in certain cases, but not always. Here's a quote from C++14 27.7.2.1.3 (thanks to chqrlie):

Class basic_istream::sentry : First, if is.tie() is not a null pointer, the function calls is.tie()->flush() to synchronize the output sequence with any associated external C stream. Except that this call can be suppressed if the put area of is.tie() is empty. Further an implementation is allowed to defer the call to flush until a call of is.rdbuf()->underflow() occurs. If no such call occurs before the sentry object is destroyed, the call to flush may be eliminated entirely.

vitaut
la source

Thanks for the explanation. Yet quoting C++14 27.7.2.1.3: Class basic_istream::sentry : First, if is.tie() is not a null pointer, the function calls is.tie()->flush() to synchronize the output sequence with any associated external C stream. Except that this call can be suppressed if the put area of is.tie() is empty. Further an implementation is allowed to defer the call to flush until a call of is.rdbuf()->underflow() occurs. If no such call occurs before the sentry object is destroyed, the call to flush may be eliminated entirely.

chqrlie

As usual with C++, things are more complex than they look. The OP's C++ library is not as efficient as the Standard allows to be.

chqrlie

Thanks for the cppreference link. I don't like the "wrong answer" in the print screen though ☺

Alex Lop.

@AlexLop. Oops, fixed the "wrong answer" issue =). Forgot to update the other cin (this doesn't affect the timing though).

vitaut

@chqrlie Right, but even in the underflow case performance is likely to be worse compared to the stdio solution. Thanks for the standard ref.

vitaut

Answer 16

Thanks for the explanation. Yet quoting C++14 27.7.2.1.3: Class basic_istream::sentry : First, if is.tie() is not a null pointer, the function calls is.tie()->flush() to synchronize the output sequence with any associated external C stream. Except that this call can be suppressed if the put area of is.tie() is empty. Further an implementation is allowed to defer the call to flush until a call of is.rdbuf()->underflow() occurs. If no such call occurs before the sentry object is destroyed, the call to flush may be eliminated entirely.

chqrlie

Answer 17

As usual with C++, things are more complex than they look. The OP's C++ library is not as efficient as the Standard allows to be.

chqrlie

Answer 18

Thanks for the cppreference link. I don't like the "wrong answer" in the print screen though ☺

Alex Lop.

Answer 19

@AlexLop. Oops, fixed the "wrong answer" issue =). Forgot to update the other cin (this doesn't affect the timing though).

vitaut

Answer 20

@chqrlie Right, but even in the underflow case performance is likely to be worse compared to the stdio solution. Thanks for the standard ref.

vitaut

Grande différence (x9) dans le temps d'exécution entre du code presque identique en C et C ++

Réponses: