Mark system and internal symbols as private symbols in asm
Needs ReviewPublic

Authored by last_g on May 22 2018, 9:51 AM.

Details

Summary

This marks system and internal symbols as private in asm output so those random generated sysmbols won't appear in .symtab

Reasoning:

  • internal symbols don't help to debug because names are just random
  • the symbols style breaks perf logic
  • internal symbols can take ~75% of the .symtab. In the same time .symtab can take about 20% of the binary file size

Notice:
This diff mostly makes sense on top of the D4713 (or similar)

Test Plan

tests

Perf from D4713

     7.97%  FibbSlow  FibbSlow            [.] c3rM_info
     6.75%  FibbSlow  FibbSlow            [.] 0x000000000032cfa8
     6.63%  FibbSlow  FibbSlow            [.] cifA_info
     4.98%  FibbSlow  FibbSlow            [.] integerzmgmp_GHCziIntegerziType_eqIntegerzh_info
     4.55%  FibbSlow  FibbSlow            [.] chXn_info
     4.52%  FibbSlow  FibbSlow            [.] c3rH_info
     4.45%  FibbSlow  FibbSlow            [.] chZB_info
     4.04%  FibbSlow  FibbSlow            [.] Main_fibbzuslow_info
     4.03%  FibbSlow  FibbSlow            [.] stg_ap_0_fast
     3.76%  FibbSlow  FibbSlow            [.] chXA_info
     3.67%  FibbSlow  FibbSlow            [.] cifu_info
     3.25%  FibbSlow  FibbSlow            [.] ci4r_info
     2.64%  FibbSlow  FibbSlow            [.] s3rf_info
     2.42%  FibbSlow  FibbSlow            [.] s3rg_info
     2.39%  FibbSlow  FibbSlow            [.] integerzmgmp_GHCziIntegerziType_eqInteger_info
     2.25%  FibbSlow  FibbSlow            [.] integerzmgmp_GHCziIntegerziType_minusInteger_info
     2.17%  FibbSlow  FibbSlow            [.] ghczmprim_GHCziClasses_zeze_info
     2.09%  FibbSlow  FibbSlow            [.] cicc_info
     2.03%  FibbSlow  FibbSlow            [.] 0x0000000000331e15
     2.02%  FibbSlow  FibbSlow            [.] s3ri_info
     1.91%  FibbSlow  FibbSlow            [.] 0x0000000000331bb8
     1.89%  FibbSlow  FibbSlow            [.] ci4N_info
...

Perf from this patch:

15.37%  FibbSlow  FibbSlow            [.] Main_fibbzuslow_info
15.33%  FibbSlow  FibbSlow            [.] integerzmgmp_GHCziIntegerziType_minusInteger_info
13.34%  FibbSlow  FibbSlow            [.] integerzmgmp_GHCziIntegerziType_eqIntegerzh_info
 9.24%  FibbSlow  FibbSlow            [.] integerzmgmp_GHCziIntegerziType_plusInteger_info
 9.08%  FibbSlow  FibbSlow            [.] frame_dummy
 8.25%  FibbSlow  FibbSlow            [.] integerzmgmp_GHCziIntegerziType_eqInteger_info
 4.29%  FibbSlow  FibbSlow            [.] 0x0000000000321ab0
 3.84%  FibbSlow  FibbSlow            [.] stg_ap_0_fast
 3.07%  FibbSlow  FibbSlow            [.] ghczmprim_GHCziClasses_zeze_info
 2.39%  FibbSlow  FibbSlow            [.] 0x0000000000321ab7
 1.90%  FibbSlow  FibbSlow            [.] 0x00000000003266b8
 1.88%  FibbSlow  FibbSlow            [.] base_GHCziNum_zm_info
 1.83%  FibbSlow  FibbSlow            [.] 0x0000000000326915
 1.34%  FibbSlow  FibbSlow            [.] 0x00000000003248cc
 1.07%  FibbSlow  FibbSlow            [.] base_GHCziNum_zp_info
 0.98%  FibbSlow  FibbSlow            [.] 0x00000000003247c8
 0.80%  FibbSlow  FibbSlow            [.] 0x0000000000121498
 0.79%  FibbSlow  FibbSlow            [.] stg_gc_noregs
 0.75%  FibbSlow  FibbSlow            [.] 0x0000000000321ad6
 0.67%  FibbSlow  FibbSlow            [.] 0x0000000000321aca
 0.64%  FibbSlow  FibbSlow            [.] 0x0000000000321b4a
 0.61%  FibbSlow  FibbSlow            [.] 0x00000000002ff633
last_g created this revision.May 22 2018, 9:51 AM

I'm not sure yet whether this is the right way to go. I'd like to hear input from others (@bgamari, @niteria, @angerman, @olsner?)

Things on my mind:

  • this reduces the granularity of labels. The perf output in particular becomes is obviously less detailed, and also worryingly seems to have more unknown addresses in it.
  • I don't think we necessarily have any sensible ordering of symbols such that all of the code for a top-level function is laid out after the label for that function. We ought to look at that, perhaps it's possible to arrange that things are emitted in the right order either in Cmm or in the NCG.
  • What about LLVM?
  • Does this work with -ffunction-sections? Why? There are things I don't understand here.
last_g edited the summary of this revision. (Show Details)May 22 2018, 7:35 PM

this reduces the granularity of labels. The perf output in particular becomes is obviously less detailed, and also worryingly seems to have more unknown addresses in it.

It looks like unknown addresses came not from this diff but from D4713. Perf is able to group better without internal symbols so I effectively copy-pasted more unknowns.

simonmar added a comment.EditedMay 23 2018, 4:29 AM

@last_g I'm probably being dumb but I don't follow why D4713 would lead to more unknown addresses in the perf output, could you elaborate?

As @simonmar noted, the LLVM part is missing, I don't see any fundamental reasons why the LLVM backend could not support the same.
Similar to the -ffunction-sections question:

  • has this been tested on macOS with -dead_strip, which uses the .subsections_via_symbols directive?
  • The lack of symbols as @simonmar noted, is concerning to me as well, and I'm in the same boat, not making the connect with D4713 yet. Could you provide the same Perf-list without the D4713 interaction?

Overall striving to reduce the symtab is a nobel goal though.

@simonmar Knowing perf logic I can speculate why we do have more unknown addresses after D4713 but this actually requires a separate investigation. I can explain how perf treats symbols if you want.
As a proof that D4713 and not this diff leads to the increase of the unknown addresses:

@angerman

Could you provide the same Perf-list without the D4713 interaction?

This patch only makes sense on top of the D4713.
https://gist.github.com/last-g/0a7a3801478f8d072ab88fd6f9047406

simonmar requested changes to this revision.May 24 2018, 3:23 PM

Had a chat with @last_g about this in person today. The overall goal is good and this will improve GHC's interaction with perf. Let's figure out the concrete steps to get this in:

  • check that when we have local symbols that are part of a top-level declaration, that we're generating the code blocks in the right order and have a .size directive that covers the whole of the collection of blocks.
  • check what happens with non-exported top-level functions. I think these currently get internal symbols, which will mean that with this patch those will probably show up as unknown addresses in perf. (and what about gdb, without -g?). If so, we will probably want to keep the labels on non-exported top-level definitions.
  • check what happens with dead-stripping on OS X. Do we still strip the same stuff?
This revision now requires changes to proceed.May 24 2018, 3:23 PM
last_g added a comment.EditedTue, Jun 19, 9:15 PM

It took a while to answer :)

I don't think we necessarily have any sensible ordering of symbols such that all of the code for a top-level function is laid out after the label for that function. We ought to look at that; perhaps it's possible to arrange that things are emitted in the right order either in Cmm or in the NCG.

The .size directive relies on the assumption on reasonable order. And all the top level symbols are generated here in pprNatCmmDecl (https://phabricator.haskell.org/diffusion/GHC/browse/master/compiler/nativeGen/X86/Ppr.hs$84)

What about LLVM?

I didn't check it and I'm not planning to address it in this patch.

Does this work with -function-sections? Why? There are things I don't understand here.

I did very simple things like:
~/local/ghc-official/inplace/bin/ghc-stage2 -split-sections -optc -ffunction-sections -optl -ffunction-sections -fforce-recomp -dynamic FibbSlow.hs -O3
and also I built stage2 with options -split-sections -optc -ffunction-sections -optl -ffunction-sections and it works.

has this been tested on macOS with -dead_strip, which uses the .subsections_via_symbols directive?

I built stage2 and simple binaries on MacOS, and checked ghc -v so it runs gcc with -Wl,-dead_strip -Wl,-dead_strip_dylibs by default. Should it be some specific test case?
EDIT: the next sentence is not correct. Need to double check.
Dead strip preventing labels use another constructor: DeadStripPreventer but my changes only affect IdLabel.

check that when we have local symbols that are part of a top-level declaration, that we're generating the code blocks in the right order and have a .size directive that covers the whole of the collection of blocks.

Answered in the first comment. Also, I checked on ghc itself that every top-level symbol has size

check what happens with non-exported top-level functions. I think these currently get internal symbols, which will mean that with this patch those will probably show up as unknown addresses in perf. (and what about gdb, without -g?). If so, we will probably want to keep the labels on non-exported top-level definitions.

That's interesting. Non-exported functions are top-level functions in the output, and with this patch, we will generate private symbols effectively hiding them. Perf at the same time can provide different results depending on what compiler did, but yeah, we can lose a symbol. I think the proper solution is to generate local symbols with meaningful names for those functions. What do you think? Should I add it here or in the separate patch?

check what happens with dead-stripping on OS X. Do we still strip the same stuff?

Looks like.


@simonmar after investigation I found that there are at least three types of the info tables with internal names:

  1. An Info table inside of the top-level info table;
  2. A top-level info table for non-exported functions;
  3. Other :)

Symbols distribution in the simplest binary:

  • Total: 11013
    • Internal symbols (including _srt, _bytes etc): ~7000
      • internal info table symbols: 5332
        • type one symbols (i.e. nested info tables): 3443
    • Symbols with human-readable names from base package: 2000

Note: internal symbol == a symbol produced by pprUniqueAlways

@last_g if you do not intend to address LLVM in any way in this patch, does this mean it will potentially render the llvm backend broken? I'm not asking for the same functionality in the LLVM backend, just that we ensure not to break -fllvm.

@angerman I successfully built a simple binary and a stage2 compiler with -fllvm. Do you have an idea what can go wrong and what/how should I test?

@angerman , @simonmar I managed to get some odd behavior with -dead_strip like getting a warning during a link stage:

can't find atom for stabs BNSYM at 0000AA18 in /Users/lastg/Workspace/ghc-official-patches/libraries/base/dist-install/build/libHSbase-4.12.0.0.a(Internals.o)

but I still got a working binary.

I'll add a workaround for the dead_strip labels and top-level info table labels.

@last_g

I think my symbol-ordering question was a red herring. There are three kinds of non-exported top-level definitions:

  • Non-exported top-level functions from the original source code. We can give these readable symbols.
  • Compiler-generated top-level functions that are exported because their symbols are visible in the interface, perhaps in an unfolding. These have semi-readable names, like Main_main3_info. There's no guarantee that this code was part of main in the original source code, the compiler assigns names to these things based on where they are referenced from.
  • Compiler-generated top-level bindings that result from optimisation (e.g. expressions floated out), but aren't visible externally. These have no readable names, and with this patch I think we'll lose their symbols in perf, correct?

Do we care that perf will say 0x0000000000331e15 instead of s3ri_info? In the case of the symbol we can at least grep object files to find which module it came from (sometimes these are ambiguous though). With the address we can go into gdb and say list *0x0000000000331e15 and see the source code, provided we compiled with -g.

last_g updated this revision to Diff 17091.Mon, Jun 25, 3:33 PM

Always keep top-level fn symbols and dsp

@simonmar

Do we care that perf will say 0x0000000000331e15 instead of s3ri_info?

I don't think there is any difference for a normal user between an address and a random symbol.

With the address we can go into gdb and say list *0x0000000000331e15 and see the source code, provided we compiled with -g.

I've tested a previous version with gdb and -g: all symbols are still in dwarf (even .Lsmth) and gdb manages to work with them.

I changed the patch to always keep top-level code symbols and added some logic to guarantee persistence for dsp labels.

I also noticed that we can actually generate symbols for non-exported functions in -O1+.

simonmar added inline comments.Tue, Jul 10, 9:03 AM
compiler/cmm/CLabel.hs
1156

"won't be temp" maybe? Could you expand the comment a bit to say why?

1195

char 'b'

1272

hmmm, this isn't the same as top-level. Only a subset of top-level things have hasCAF lbl == True.

1273–1286

Ok, I've been thinking about this some more, especially in the light of your comment that gdb can see the .Lxxx symbols. That means we don't lose anything by turning a compiler-generated symbol into a .L symbol, at least as far as gdb is concerned. We might lose some details in perf, but as you say maybe we don't care about the difference between a raw address and a compiler-generated symbol.

The condition we should probably use is isExternalName, since this corresponds to exactly those symbols that have readable names and must be consistent across modules. Which is almost what you had before, except that you had isInternalName || isSystemName, and I think isSystemName is redundant.