*** linuxjacques <linuxjacques!~jacques@nslu2-linux/jacques> has quit IRC | 00:07 | |
*** linuxjacques <linuxjacques!~jacques@nslu2-linux/jacques> has joined #yocto | 00:08 | |
*** moto-timo <moto-timo!~ttorling@fsf/member/moto-timo> has quit IRC | 00:46 | |
*** cp <cp!~cp@b157153.ppp.asahi-net.or.jp> has quit IRC | 01:22 | |
*** cp <cp!~cp@b157153.ppp.asahi-net.or.jp> has joined #yocto | 01:24 | |
*** chinhuat6 <chinhuat6!~chinhuat@192.198.146.173> has joined #yocto | 02:04 | |
*** chinhuat <chinhuat!~chinhuat@192.198.146.173> has quit IRC | 02:05 | |
*** georgem_home <georgem_home!uid210681@gateway/web/irccloud.com/x-ibejuaonblxypjxw> has quit IRC | 03:06 | |
*** learningc <learningc!~learningc@mti-37-145.tm.net.my> has joined #yocto | 03:19 | |
*** behanw <behanw!uid110099@gateway/web/irccloud.com/x-erdyoaqratczusxr> has joined #yocto | 03:40 | |
*** learningc <learningc!~learningc@mti-37-145.tm.net.my> has quit IRC | 04:10 | |
*** learningc <learningc!~learningc@mti-37-145.tm.net.my> has joined #yocto | 04:11 | |
*** Dvorkin <Dvorkin!~Dvorkin@176.114.204.12> has quit IRC | 04:26 | |
*** Bunio_FH <Bunio_FH!~bunio@clj-165.netdrive.pl> has quit IRC | 04:38 | |
*** AndersD <AndersD!~AndersD@h-98-128-162-82.NA.cust.bahnhof.se> has joined #yocto | 05:27 | |
*** asabil <asabil!~asabil@81.167.213.26.static.lyse.net> has joined #yocto | 06:10 | |
*** jeanba <jeanba!~jbl@77.243.63.34> has joined #yocto | 06:16 | |
*** jeanba <jeanba!~jbl@77.243.63.34> has left #yocto | 06:16 | |
*** TobSnyder <TobSnyder!~schneider@ip5f5aa32f.dynamic.kabel-deutschland.de> has joined #yocto | 06:19 | |
*** chinhuat69 <chinhuat69!~chinhuat@192.198.146.173> has joined #yocto | 06:25 | |
*** chinhuat69 is now known as chinhuat | 06:26 | |
*** chinhuat6 <chinhuat6!~chinhuat@192.198.146.173> has quit IRC | 06:27 | |
*** agust <agust!~agust@p54833DBB.dip0.t-ipconnect.de> has joined #yocto | 06:31 | |
*** asabil <asabil!~asabil@81.167.213.26.static.lyse.net> has quit IRC | 06:54 | |
*** asabil <asabil!~asabil@81.167.213.26.static.lyse.net> has joined #yocto | 06:55 | |
*** jmiehe <jmiehe!~Thunderbi@p578c106e.dip0.t-ipconnect.de> has joined #yocto | 06:59 | |
*** T_UNIX <T_UNIX!uid218288@gateway/web/irccloud.com/x-ziblbfxkpvkrfmxu> has joined #yocto | 07:26 | |
*** Bunio_FH <Bunio_FH!~bunio@81-18-201-214.static.chello.pl> has joined #yocto | 07:46 | |
*** learningc <learningc!~learningc@mti-37-145.tm.net.my> has quit IRC | 08:01 | |
*** learningc <learningc!~learningc@mti-37-145.tm.net.my> has joined #yocto | 08:03 | |
*** florian <florian!~florian_k@Maemo/community/contributor/florian> has joined #yocto | 08:13 | |
*** qt-x <qt-x!50614037@80.97.64.55> has joined #yocto | 08:21 | |
qt-x | Can a recipe be made to build multile images ? eg. bitbake multi-image | 08:24 |
---|---|---|
qt-x | where multi-image.bb instructs to build 3 or more images | 08:26 |
RP | qt-x: yes, just have them as dependencies do_sometask[depends] = "image1:do_image_complete image2:do_image_complete2 imag3_do_image_complete" | 08:28 |
Saur | RP: Regarding that ResourceWarning I mentioned yesterday, I am building without the hash server enabled. | 08:30 |
qt-x | awesome thanks RP | 08:30 |
*** jeanba <jeanba!~jbl@77.243.63.34> has joined #yocto | 08:37 | |
*** jeanba <jeanba!~jbl@77.243.63.34> has left #yocto | 08:37 | |
*** florian <florian!~florian_k@Maemo/community/contributor/florian> has quit IRC | 08:38 | |
*** gaulishcoin <gaulishcoin!~gaulishco@anice-652-1-127-215.w83-201.abo.wanadoo.fr> has joined #yocto | 08:41 | |
*** alimon <alimon!alimon@gateway/shell/linaro/x-muzwthdlgbmewyai> has quit IRC | 08:41 | |
*** jofr <jofr!~jofr@193.182.166.3> has quit IRC | 08:47 | |
*** jofr <jofr!~jofr@193.182.166.3> has joined #yocto | 08:48 | |
asabil | Hi everyone | 08:49 |
asabil | I was wondering why when an initramfs is built and bundled with bitbake, the kernel with the bundled initramfs is not actually packages | 08:50 |
asabil | and only left laying around in the build tree | 08:50 |
asabil | s/packages/packaged/ | 08:52 |
yocti | New news from stackoverflow: /dev/fd/ socket or pipe links fail, NOT missing /dev/fd link <https://stackoverflow.com/questions/57498881/dev-fd-socket-or-pipe-links-fail-not-missing-dev-fd-link> | 08:56 |
*** BCMM <BCMM!~BCMM@unaffiliated/bcmm> has joined #yocto | 08:56 | |
*** florian <florian!~florian_k@Maemo/community/contributor/florian> has joined #yocto | 09:07 | |
*** gaulishcoin <gaulishcoin!~gaulishco@anice-652-1-127-215.w83-201.abo.wanadoo.fr> has quit IRC | 09:07 | |
asabil | In my case Image.gz.initramfs is created but only Image.gz is packaged | 09:11 |
asabil | This is inside kernel.bbclass | 09:12 |
*** asabil <asabil!~asabil@81.167.213.26.static.lyse.net> has quit IRC | 09:18 | |
*** asabil <asabil!~asabil@81.167.213.26.static.lyse.net> has joined #yocto | 09:19 | |
*** edgar444 <edgar444!uid214381@gateway/web/irccloud.com/x-occuhnpylrldtags> has joined #yocto | 09:28 | |
*** learningc <learningc!~learningc@mti-37-145.tm.net.my> has quit IRC | 09:45 | |
*** opennandra <opennandra!~marek@109-230-35-25.dynamic.orange.sk> has joined #yocto | 09:45 | |
*** opennandra <opennandra!~marek@109-230-35-25.dynamic.orange.sk> has quit IRC | 09:56 | |
*** learningc <learningc!~learningc@121.122.92.39> has joined #yocto | 10:06 | |
*** bluelightning_ <bluelightning_!~paul@pdpc/supporter/professional/bluelightning> has joined #yocto | 10:10 | |
*** bluelightning <bluelightning!~paul@pdpc/supporter/professional/bluelightning> has quit IRC | 10:15 | |
*** bluelightning_ <bluelightning_!~paul@pdpc/supporter/professional/bluelightning> has quit IRC | 10:31 | |
*** dmoseley_ <dmoseley_!~dmoseley@user-24-236-82-253.knology.net> has quit IRC | 10:34 | |
*** bluelightning_ <bluelightning_!~paul@pdpc/supporter/professional/bluelightning> has joined #yocto | 10:35 | |
*** vmeson <vmeson!~rmacleod@24-52-238-240.cable.teksavvy.com> has quit IRC | 10:56 | |
*** goliath <goliath!~goliath@clnet-p04-043.ikbnet.co.at> has joined #yocto | 10:58 | |
*** asabil <asabil!~asabil@81.167.213.26.static.lyse.net> has quit IRC | 11:02 | |
*** asabil <asabil!~asabil@81.167.213.26.static.lyse.net> has joined #yocto | 11:03 | |
*** pung_ <pung_!~BobPungar@187.113.136.3> has joined #yocto | 11:16 | |
*** BobPungartnik <BobPungartnik!~BobPungar@187.113.128.138> has quit IRC | 11:21 | |
*** blueness <blueness!~blueness@gentoo/developer/blueness> has quit IRC | 11:22 | |
*** kroon <kroon!~kroon@213.185.29.22> has joined #yocto | 11:24 | |
ndec | LetoThe2nd: https://www.youtube.com/watch?v=Itn_at7kfVw | 11:25 |
Crofton|work | scary thig to see in the morning! | 11:33 |
*** berton <berton!~berton@181.220.83.67> has joined #yocto | 11:43 | |
*** Chrusel <Chrusel!c1669b04@193.102.155.4> has joined #yocto | 12:23 | |
asabil | Does anyone have any input regarding the question I asked earlier? | 12:28 |
RP | asabil: initramfs depends on a lot of different things, some images include them, some don't. It could also vary depending on the target architecture, kernel and machine. Basically it depends on a lot of different things | 12:29 |
asabil | RP: yes I know, my problem is with the kernel.bbclass | 12:30 |
asabil | it generates <Image>.initramfs and then forgets about it | 12:30 |
asabil | it feels like a bug to me, but I am not sure | 12:30 |
asabil | https://git.yoctoproject.org/cgit/cgit.cgi/poky/tree/meta/classes/kernel.bbclass#n256 | 12:31 |
*** Dvorkin <Dvorkin!~Dvorkin@176.114.204.12> has joined #yocto | 12:31 | |
Dvorkin | How to overwrite default Kconfig vaue using meta? | 12:32 |
*** opennandra <opennandra!~marek@109-230-35-25.dynamic.orange.sk> has joined #yocto | 12:32 | |
opennandra | hello | 12:32 |
opennandra | I have vendor SDK which have heavy patched u-boot + kernel + rootfs | 12:32 |
asabil | and then this is where it specifies the Package files https://git.yoctoproject.org/cgit/cgit.cgi/poky/tree/meta/classes/kernel.bbclass#n93 | 12:33 |
opennandra | I plan to use u-boot and kernel and move building rest to yocto | 12:33 |
opennandra | project using 4.9. linaro toolchain | 12:33 |
opennandra | my target is poky rocko | 12:33 |
asabil | nothing ever refers to $imageType.initramfs | 12:33 |
opennandra | but I'm having some issues like: ERROR: gmp-6.1.2-r0 do_package_qa: QA Issue: libgmpxx rdepends on external-linaro-toolchain-dbg [debug-deps] | 12:34 |
opennandra | ERROR: gmp-6.1.2-r0 do_package_qa: QA Issue: /usr/lib/libgmpxx.so.4.5.2 contained in package libgmpxx requires libstdc++.so.6(CXXABI_1.3), but no providers found in RDEPENDS_libgmpxx? [file-rdeps] | 12:34 |
opennandra | ERROR: gmp-6.1.2-r0 do_package_qa: QA Issue: /usr/lib/libgmpxx.so.4.5.2 contained in package libgmpxx requires libstdc++.so.6(CXXABI_ARM_1.3.3), but no providers found in RDEPENDS_libgmpxx? [file-rdeps] | 12:34 |
opennandra | ERROR: gmp-6.1.2-r0 do_package_qa: QA Issue: /usr/lib/libgmpxx.so.4.5.2 contained in package libgmpxx requires libstdc++.so.6(GLIBCXX_3.4), but no providers found in RDEPENDS_libgmpxx? [file-rdeps] | 12:34 |
opennandra | ERROR: gmp-6.1.2-r0 do_package_qa: QA Issue: /usr/lib/libgmpxx.so.4.5.2 contained in package libgmpxx requires libstdc++.so.6(GLIBCXX_3.4.11), but no providers found in RDEPENDS_libgmpxx? [file-rdeps] | 12:34 |
opennandra | ERROR: gmp-6.1.2-r0 do_package_qa: QA Issue: /usr/lib/libgmpxx.so.4.5.2 contained in package libgmpxx requires libstdc++.so.6, but no providers found in RDEPENDS_libgmpxx? [file-rdeps] | 12:34 |
opennandra | it is even good idea to try it like that? | 12:34 |
*** georgem_home <georgem_home!uid210681@gateway/web/irccloud.com/x-fqfuyhgvjkzaeenl> has joined #yocto | 12:36 | |
RP | opennandra: the shlibs automatic dependency code is confused as it can't figure out what should be providing libstdc++ | 12:38 |
RP | opennandra: a question for the supplier of this external toolchain | 12:38 |
opennandra | RP: looks like it's QA issue so can I just suppress it and try? | 12:39 |
opennandra | does this thing even make sense? | 12:39 |
*** georgem <georgem!~georgem@216.21.169.52> has quit IRC | 12:46 | |
jwessel | RP: I have some more information about the glibc-locale pseudo issue, but I still don't really understand the nature of the problem yet. | 12:46 |
*** asabil <asabil!~asabil@81.167.213.26.static.lyse.net> has quit IRC | 12:47 | |
*** georgem <georgem!~georgem@216.21.169.52> has joined #yocto | 12:47 | |
jwessel | It took 11 hours to get an instrumented pseudo to reproduce the problem. It just failed a bit ago. | 12:47 |
*** asabil <asabil!~asabil@81.167.213.26.static.lyse.net> has joined #yocto | 12:47 | |
jwessel | https://pastebin.com/yHc0HSme | 12:47 |
jwessel | I don't know how much you looked at it, but there is a filter log with the important bits stripped out. | 12:47 |
jwessel | I am trying to understand the sequence of events that lead up to pseudo to decide the uid is wrong. | 12:49 |
*** qt-x <qt-x!50614037@80.97.64.55> has quit IRC | 12:55 | |
*** kaspter <kaspter!~Instantbi@183.128.238.14> has quit IRC | 12:59 | |
RP | opennandra: Its a warning that something is seriously wrong with your build. Sure you can turn it off but that doesn't fix it. | 13:12 |
RP | jwessel: interesting. Let me see if I can page in from swap :) | 13:13 |
opennandra | RP: ok what is then sugeested way ? Use some older yoct orelease? | 13:13 |
opennandra | Rp: I need to build stuff with external toolchain | 13:13 |
*** JPEW <JPEW!cc4da337@204.77.163.55> has joined #yocto | 13:15 | |
*** asabil <asabil!~asabil@81.167.213.26.static.lyse.net> has quit IRC | 13:16 | |
RP | opennandra: Fix the external toolchain, or as the supplier of the external toolchain why its broken. I know nothing about it so I can't really help. I know what that error means but I don't know what the correct fix is or anything about the toolchain. I do know that turning off the check will just make it fail later | 13:17 |
RP | jwessel: the other interesting thing is "mode 100600" - where did that come from... | 13:18 |
opennandra | RP: ok thanks a lot, I'll ask on mailing list then | 13:18 |
RP | jwessel: I've always thought that there was stale data in pseudo's db that by chance happens to corrupt a new file | 13:19 |
RP | jwessel: If I was right about that, the question is where did the bad info come from (stale inode?) | 13:19 |
RP | jwessel: how complete are your logs? can you tell if that inode has any previous history with an unrelated file? | 13:20 |
Saur | RP: Is there some way to tell bitbake to copy files from SSTATE_MIRRORS rather than creating symbolic links to them? In our case we have the global sstate cache on an NFS mount and I would prefer to copy the files to the local sstate cache rather than having to retrieve them via NFS each time. | 13:22 |
*** vmeson <vmeson!~rmacleod@24-52-238-240.cable.teksavvy.com> has joined #yocto | 13:24 | |
*** nabokov <nabokov!~armand@67.218.223.154> has joined #yocto | 13:25 | |
*** leitao <leitao!~leitao@2620:10d:c092:200::1:acf8> has joined #yocto | 13:28 | |
*** bluelightning_ <bluelightning_!~paul@pdpc/supporter/professional/bluelightning> has quit IRC | 13:30 | |
*** leitao <leitao!~leitao@2620:10d:c092:200::1:acf8> has quit IRC | 13:30 | |
RP | Saur: you'd have to tweak the fetcher code afaik | 13:31 |
*** leitao <leitao!~leitao@2620:10d:c092:200::1:acf8> has joined #yocto | 13:31 | |
Saur | RP: Ok. Would it be acceptable to add a way to force copying instead of linking? Either globally, or perhaps some way to do it per URL in SSTATE_MIRRORS? | 13:33 |
RP | Saur: My reservation is just about more codepaths combinations in the fetcher code :( | 13:33 |
RP | per url in sstate_mirrors sounds like some kind of nightmare | 13:34 |
Saur | Probably not too easy to add either. Globally should probably be a lot easier. | 13:34 |
Saur | Ok, I'll have a look at the code and see what it would involve. | 13:35 |
RP | Saur: The trouble is each time you add a binary "yes/no" decision for a feature like that into the fetcher, it doubles our test matric | 13:36 |
RP | matirx. Given the people we have maintaining it (or not), I'm rather adverse to such controls | 13:37 |
Saur | RP: Yeah, I know. At the same time having the symbolic links to an NFS mount is less than optimal. Especially when there is network failure and the NFS mount is gone for a day due to IT not being able to get the network working (yes, we had that the other day) :P | 13:38 |
JPEW | Saur: You might be able to expose it over HTTP also? | 13:39 |
Saur | JPEW: Yeah, that is an alternative, but then it becomes a matter of authentication... With the NFS mount that is taken care of by who can mount it... | 13:40 |
JPEW | Saur: Ah, ya that gets a little tricky | 13:40 |
*** kaspter <kaspter!~Instantbi@183.128.238.14> has joined #yocto | 13:40 | |
RP | JPEW: btw, I found another data corruption bug in runqueue. Hoping this explains a few weird things! | 13:42 |
JPEW | RP: Cool. Did you get any stats from the AB? | 13:42 |
RP | Saur: Its a tricky one. Once I accept such a patch we're stuck trying to maintain that API effectively indefinitely though :/ | 13:42 |
RP | JPEW: {"connections": {"total_time": 2058.1817467394285, "max_time": 0.2676054770126939, "num": 1772291, "average": 0.0011613114024386676, "stdev": 0.003402929594519231}, "requests": {"total_time": 1224.0615269057453, "max_time": 0.26773543702438474, "num": 1772290, "average": 0.0006906666103773904, "stdev": 0.0005487492249695723}} | 13:43 |
RP | JPEW: that was after a single build approximately completed | 13:43 |
JPEW | RP: Any connection timeouts? | 13:43 |
RP | JPEW: loads | 13:43 |
RP | JPEW: https://autobuilder.yoctoproject.org/typhoon/#/builders/83/builds/331 - any warning is a timeout | 13:44 |
JPEW | Hmm. Ok, that's unfortunate. It means my stats probably aren't capturing where the timeout happens :( | 13:44 |
Saur | RP: Yet without the possible to add those kinds of tweaks, my hands are very limited as I have no way of doing local modifications to bitbake, compared to classes and recipes that I can copy/append to locally. | 13:44 |
RP | JPEW: I think it just means the server can't handle enough requests to stop some connections stalling | 13:44 |
JPEW | i.e. the Kernel can't accept anymore connections? Ya that seems likely | 13:45 |
JPEW | 1772290 * (0.0011 + 0.0007) = 3190 seconds | 13:46 |
RP | Saur: you can monkey patch bitbake. You'd just not *like* to do that | 13:46 |
Saur | No, I definitely don't like the idea of doing that... | 13:47 |
JPEW | RP: So, I think we can say that connections are being serviced in a reasonably timely fashion once if the kernel actually allows them | 13:48 |
jwessel | RP: That was included in the log I posted. | 13:48 |
jwessel | That particular inode was used earlier, but it was still good at the time. | 13:49 |
jwessel | I am not exactly sure if it is DB corruption, or some kind of a odd race condition. | 13:51 |
jwessel | I suspect I'll have to add additional logging information but I am not sure what to add yet. | 13:51 |
jwessel | This is the first time we have caught it "red handed" so to speak at the first time the bad entry is inputed into the DB. | 13:52 |
jwessel | I'd like to be able to create a stand alone test that does that emits the same kind of log entries. | 13:54 |
jwessel | What I don't know is if the client 1/2 of the operation is where things went bad. This is only the server side. I wasn't sure if the client picks up the UID info and just passes it along, or if the server is making some kind of decision. To me it looked like a brand new entry. | 13:55 |
RP | jwessel: this is the challenge with debugging this, its very hard to tell | 13:57 |
jwessel | I'll have to go read some more code and such. I have only been working on this intermittently, so I thought I might post what I had. | 13:57 |
RP | JPEW: I guess we need to more threads in parallel answering the connections? | 13:58 |
jwessel | What we know now definitively, is that it is a re-used inode and it has something to do with the hard links and mv operations. | 13:58 |
RP | jwessel: its useful, I'm just also unfortunately in the middle of a complex mess with runqueue :( | 13:58 |
RP | jwessel: yes | 13:59 |
jwessel | I am in the middle of 2 or 3 other things myself :-) | 13:59 |
JPEW | Ya, I think we should make the server use the socketserver.ThreadMixIn class to thread the server, then make the siggen code use a persistent connection | 13:59 |
RP | JPEW: doesn't that code use a thread per connection? | 13:59 |
RP | JPEW: I'm not really willing to go that far, bit risky | 14:00 |
*** leitao <leitao!~leitao@2620:10d:c092:200::1:acf8> has quit IRC | 14:00 | |
JPEW | If you want to share a persistent connection, it makes more sense because then it won't run out of threads to handle new connections (or you need a thread pool that is bounded by the maximum number of clients you expect at any given time) | 14:01 |
*** TobSnyder <TobSnyder!~schneider@ip5f5aa32f.dynamic.kabel-deutschland.de> has quit IRC | 14:01 | |
RP | JPEW: I suspect the current design can handle persistent connections, its the shear number which is overloading it | 14:02 |
RP | JPEW: I suspect a thread pool may be easier than persisting though :/ | 14:02 |
JPEW | RP: Not with one thread.... the single thread will handle only one connection until it closes, so it would block all others | 14:03 |
*** opennandra <opennandra!~marek@109-230-35-25.dynamic.orange.sk> has quit IRC | 14:04 | |
*** tijko <tijko!~tijko@unaffiliated/tijko> has joined #yocto | 14:04 | |
JPEW | RP: There are easy ways to make the connection persistent, just not with stock python modules. | 14:04 |
RP | JPEW: hmm, I thought the thread worked differently to that :/ | 14:05 |
JPEW | RP: let me look, I might be confusing something | 14:06 |
JPEW | Rp: OK, I was right. The HTTP handler base class processes request on the connection until it closes: https://github.com/python/cpython/blob/3.7/Lib/http/server.py#L422 | 14:08 |
JPEW | You could, I supposes pass *those* all off to yet another thread, but that seems messy | 14:09 |
RP | JPEW: I think we're seeing it differently in that we're thinking about different threads | 14:09 |
RP | JPEW: I'm looking at it from the perspective of the sockets being opened by the server. There is a thread dedicated to doing that and queuing them up which isn't blocked on closing | 14:10 |
*** FailDev <FailDev!18d83107@24.216.49.7> has joined #yocto | 14:10 | |
*** kaspter <kaspter!~Instantbi@183.128.238.14> has quit IRC | 14:12 | |
*** kaspter <kaspter!~Instantbi@183.128.238.14> has joined #yocto | 14:13 | |
JPEW | RP: Correct... I was running ahead and trying to think about persistent connections and threads. I don't really see how adding more threads would help with non-persistent connections? | 14:14 |
*** armpit <armpit!~armpit@2601:202:4180:c33:bf:ea8b:b284:1e7e> has quit IRC | 14:14 | |
*** leitao <leitao!~leitao@2620:10d:c092:200::1:acf8> has joined #yocto | 14:14 | |
RP | JPEW: it depends where our bottleneck is and I guess on that we're still perhaps not quite in agreement | 14:15 |
JPEW | RP: Ok, right. I think (assuming my stat code is correct) that the server handles requests timely once it actually accepts() them. The one metric we don't have is the amount of time a connection is pending in the kernel before userspace calls accept() to get it. | 14:19 |
JPEW | So either 1) The connections are waiting for a long period of time in the listen queue before userspace calls accept() them | 14:20 |
RP | JPEW: think about this maths, We have 40 different autobuilder targets each with 9000 tasks starting in parallel | 14:20 |
RP | JPEW: that means 360,000 requests approximately in parallel which we need to answer in less than two minutes. Can the server do that? | 14:21 |
RP | JPEW: with the average connection time it would take 418s, the average requests time, 248s, both of which are > 120s | 14:22 |
RP | JPEW: so I suspect even the request time is too slow :/ | 14:23 |
RP | (I'm open to persuasion I'm missing something) | 14:23 |
*** Chrusel <Chrusel!c1669b04@193.102.155.4> has quit IRC | 14:24 | |
JPEW | RP: You are correct. I suspect adding one more thread to the pool would cut the request time almost in half... the request time includes the I/O time required to read the data from TCP socket as well as write the results back. | 14:24 |
*** tijko <tijko!~tijko@unaffiliated/tijko> has quit IRC | 14:25 | |
*** armpit <armpit!~armpit@2601:202:4180:c33:7c98:5faa:262d:a3af> has joined #yocto | 14:25 | |
RP | JPEW: right, we probably ideally want a pool of around 5-10 | 14:28 |
*** tijko <tijko!~tijko@unaffiliated/tijko> has joined #yocto | 14:28 | |
*** dreyna <dreyna!~dreyna@c-24-5-28-247.hsd1.ca.comcast.net> has joined #yocto | 14:28 | |
JPEW | RP: I think doing that will significantly reduce the request time | 14:28 |
JPEW | RP: But the connect time on it's own is still too long.... I suppose it's possible the reduction in request time would also reduce the connection time | 14:30 |
JPEW | RP: Which actually seems pretty likely. The connect time can't *possibly* be shorter (on average) than the request time if the server is running full tilt with a single request handling trhead | 14:31 |
*** JaMa <JaMa!~martin@ip-217-030-068-212.aim-net.cz> has joined #yocto | 14:37 | |
JaMa | either my builds are much bigger or bitbake on one of most powerful servers I've access to is still slower than what people use, will send the parsing times to ML "shortly" last test running on No currently running tasks (28875 of 71749) | 14:40 |
*** FailDev <FailDev!18d83107@24.216.49.7> has quit IRC | 14:45 | |
*** Bunio_FH <Bunio_FH!~bunio@81-18-201-214.static.chello.pl> has quit IRC | 14:54 | |
*** kaspter <kaspter!~Instantbi@183.128.238.14> has quit IRC | 14:57 | |
*** leitao <leitao!~leitao@2620:10d:c092:200::1:acf8> has quit IRC | 15:00 | |
*** asabil <asabil!~asabil@2a01:79d:7375:2ca4:fd3d:19c0:f6c1:5aea> has joined #yocto | 15:00 | |
*** leitao <leitao!~leitao@2620:10d:c092:200::1:acf8> has joined #yocto | 15:02 | |
*** FailDev <FailDev!18d83107@24.216.49.7> has joined #yocto | 15:03 | |
*** leitao <leitao!~leitao@2620:10d:c092:200::1:acf8> has quit IRC | 15:06 | |
*** leitao <leitao!~leitao@2620:10d:c092:200::1:acf8> has joined #yocto | 15:08 | |
*** kaspter <kaspter!~Instantbi@183.128.238.14> has joined #yocto | 15:16 | |
*** edgar444 <edgar444!uid214381@gateway/web/irccloud.com/x-occuhnpylrldtags> has quit IRC | 15:17 | |
*** asabil <asabil!~asabil@2a01:79d:7375:2ca4:fd3d:19c0:f6c1:5aea> has quit IRC | 15:18 | |
*** kroon <kroon!~kroon@213.185.29.22> has quit IRC | 15:23 | |
RP | JaMa: or you're hitting the same inotify issue that kanavin's profile seemed to show... | 15:35 |
*** leitao <leitao!~leitao@2620:10d:c092:200::1:acf8> has quit IRC | 15:52 | |
*** leitao <leitao!~leitao@2620:10d:c092:200::1:acf8> has joined #yocto | 15:59 | |
*** goliath <goliath!~goliath@clnet-p04-043.ikbnet.co.at> has quit IRC | 16:02 | |
JaMa | RP: yes, we'll see -P is running now | 16:05 |
JaMa | RP: those "ResourceWarning: unclosed" warnings are about the socket of PRServer on localhost, now I see in some builds that it even started PRServer twice | 16:08 |
JaMa | NOTE: Started PRServer with DBfile: prserv.sqlite3, IP: 127.0.0.1, PORT: 44707, PID: 3947 | 16:08 |
*** leitao <leitao!~leitao@2620:10d:c092:200::1:acf8> has quit IRC | 16:08 | |
JaMa | NOTE: Terminating PRServer... | 16:08 |
JaMa | NOTE: Started PRServer with prserv.sqlite3, IP: 127.0.0.1, PORT: 42189, PID: 3949 | 16:08 |
JaMa | bb/codeparser.py:419: ResourceWarning: unclosed <socket.socket fd=10, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=0, laddr=('127.0.0.1', 44707)> "while_clause": lambda x: (chain(x.condition, x.cmds), None), | 16:08 |
JaMa | and it's triggered from various places (not just codeparser.py:419) | 16:09 |
*** leitao <leitao!~leitao@2620:10d:c092:200::1:acf8> has joined #yocto | 16:09 | |
*** leitao <leitao!~leitao@2620:10d:c092:200::1:acf8> has quit IRC | 16:11 | |
*** jmiehe <jmiehe!~Thunderbi@p578c106e.dip0.t-ipconnect.de> has quit IRC | 16:12 | |
JaMa | RP: if I return at "Executing tasks", then it took only 6mins and there is only one notification in the profile (21 config_notification) | 16:16 |
RP | JaMa: so its the iterating through the tasks part that is slow for you? | 16:18 |
*** leitao <leitao!~leitao@2620:10d:c092:200::1:acf8> has joined #yocto | 16:21 | |
RP | JaMa: I think kanavin's numbers were Ctrl+C before exeuting tasks | 16:22 |
JaMa | RP: looks like it, will let it run whole build with -n (with latest master-next it was "only" 90 mins) | 16:22 |
RP | JaMa: that is *way* too long | 16:22 |
RP | JaMa: did you have the -P output for that? | 16:23 |
JaMa | or should I move the return to some better place? I'm looking at the execute() function but don't see where it would make most sense | 16:23 |
JaMa | no, I have -P output only for this short 6min part | 16:24 |
JaMa | 90mins is *way* too long, but with older master-next it was over 10 hours, so it's nice improvement :) | 16:27 |
RP | JaMa: Right, its better. What was it before we started messing with runqueue? | 16:27 |
JaMa | before messing with runqueue (bitbake 1f630fdf0260db08541d3ca9f25f852931c19905) it is over 4 hours | 16:28 |
RP | JaMa: so we did actually get better, its still just slow | 16:29 |
JaMa | I can try even older revision, but on small sample (core-image-minimal) this revision was the fast baseline | 16:29 |
*** goliath <goliath!~goliath@212-186-42-13.cable.dynamic.surfer.at> has joined #yocto | 16:29 | |
JaMa | or something is messing with my benchmark like those PRserver connections | 16:30 |
JaMa | will try to disable PRserv as well | 16:30 |
RP | JaMa: I tried a "bitbake -n world" for poky so effectively oe-core and it takes 2m50 | 16:35 |
*** vineela <vineela!vtummala@nat/intel/x-hjjhjvtldyngfbkt> has joined #yocto | 16:35 | |
dkc | Hey, i'm trying to implement something similar to that: https://stackoverflow.com/questions/52729727/how-to-include-git-revision-of-image-layer-in-boot-output-or-etc-issue | 16:35 |
RP | JaMa: appears to be around 200 tasks/second (12044 in total) | 16:35 |
dkc | but the variable with the layer revision seems to never be updated | 16:35 |
JaMa | for me it was about 10 tasks/second, but most of the time was spent between "Executing tasks" message and the next line "Executing task (1 from 71749)" | 16:38 |
JaMa | RP: I should also note that I had BB_NUMBER_THREADS = "8" on this 72threads machine | 16:40 |
JaMa | most of the time it looks like this: | 16:42 |
JaMa | PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command | 16:43 |
JaMa | 28719 mjansa 20 0 1328M 1265M 7096 R 100. 0.3 4:03.05 ├─ python3 /bitbake/bin/bitbake world -P -n | 16:43 |
JaMa | 28888 mjansa 20 0 496M 370M 8856 S 0.0 0.1 0:01.91 │ └─ python3 /bitbake/bin/bitbake-worker decafbadbad | 16:43 |
JaMa | 28889 mjansa 20 0 496M 370M 8856 S 0.0 0.1 0:00.01 │ └─ python3 /bitbake/bin/bitbake-worker decafbadbad | 16:43 |
JaMa | 28712 mjansa 20 0 164M 29380 9532 S 40.7 0.0 0:10.01 │ │ └─ python3 /bitbake/bin/bitbake world -P -n | 16:43 |
JaMa | 28718 mjansa 20 0 164M 29380 9532 R 17.7 0.0 0:04.14 │ │ └─ python3 /bitbake/bin/bitbake world -P -n | 16:43 |
JaMa | and only the first one being busy after "NOTE: Executing Tasks" is shown | 16:44 |
RP | JaMa: that makes sense, the workers get spawned but just return so only the cooker would be loaded | 16:45 |
*** blueness <blueness!~blueness@gentoo/developer/blueness> has joined #yocto | 16:46 | |
JaMa | arent workers spawned only when it also shows something like "Executing task (1 from 71749)" ? | 16:47 |
RP | JaMa: yes | 16:48 |
JaMa | I'm not that far, it will take 1-3 hours until I get to that point | 16:48 |
JaMa | Initialising tasks: 100% |###############################################################################################################################################################################################################################| Time: 0:03:20 | 16:48 |
JaMa | Sstate summary: Wanted 25260 Found 0 Missed 25260 Current 0 (0% match, 0% complete) | 16:48 |
JaMa | NOTE: Executing Tasks | 16:48 |
RP | JaMa: Setting to 8 threads gave 3m1 | 16:50 |
JaMa | interesting, lets see what the profile data will show, but will probably send it tomorrow | 16:51 |
JaMa | this build has also multilib enabled, so it almost doubles the number of available tasks, I should have used something a bit smaller | 16:52 |
*** georgem_home <georgem_home!uid210681@gateway/web/irccloud.com/x-fqfuyhgvjkzaeenl> has quit IRC | 16:56 | |
*** nagio <nagio!~andrea@host158-71-dynamic.4-87-r.retail.telecomitalia.it> has joined #yocto | 16:59 | |
*** User_ <User_!~learningc@121.122.92.39> has joined #yocto | 17:00 | |
*** learningc <learningc!~learningc@121.122.92.39> has quit IRC | 17:04 | |
RP | JaMa: I'll have to run some more experiments, a build with meta-oe in here is definitely slower :/ | 17:13 |
RP | JaMa: 33000 tasks, 22minutes | 17:15 |
*** goliath <goliath!~goliath@212-186-42-13.cable.dynamic.surfer.at> has quit IRC | 17:19 | |
JaMa | RP: this is with latest bitbake? | 17:26 |
RP | JaMa: yes | 17:27 |
RP | JaMa: putting it under profiling now but I really need to look at the correctness problems of the last build :/ | 17:28 |
RP | Looks like we've got some kind of nasty metadata race somewhere :/ | 17:29 |
JaMa | will bisect where bitbake started terminating PRserv, it's not one of 2 commits I was suspecting (the cleanup for termination) | 17:31 |
JaMa | looks like this one: 05888700 cooker: Improve hash server startup code to avoid exit tracebacks | 17:31 |
JaMa | this issue is reproducible with 'PRSERV_HOST = "localhost:0"' in local.conf | 17:33 |
JaMa | but probably not related to the delay I'm seeing, because for last run with -P I've disabled PRserv and rm_work as well and it's still sitting between "Executing tasks" and executing them | 17:34 |
JaMa | RP: if I remove self.handlePRServ() added to reset() it works again | 17:39 |
RP | JaMa: right, that would account for the socket issue but probably not the delays | 17:40 |
*** T_UNIX <T_UNIX!uid218288@gateway/web/irccloud.com/x-ziblbfxkpvkrfmxu> has quit IRC | 17:52 | |
*** Bunio_FH <Bunio_FH!~bunio@clj-165.netdrive.pl> has joined #yocto | 17:52 | |
JPEW | RP: Added more stats logging and multiple threads to the hash server: http://git.yoctoproject.org/cgit.cgi/poky-contrib/commit/?h=jpew/hashserve-stats&id=66c70708515cdeb903a93bd738b8f6a87cdd3926 | 18:03 |
dkc | do you have tips on how I could include the git revision of our custom layer in the yocto image? I have a hacky solution with a "nostamp" task, but as a consequence it forces the generation of the rootfs even if nothing changed, I'd like to avoid that | 18:07 |
JaMa | dkc: check ./meta/classes/image-buildinfo.bbclass | 18:15 |
dkc | JaMa: looks like exactly like what I need, thanks | 18:20 |
*** dreyna <dreyna!~dreyna@c-24-5-28-247.hsd1.ca.comcast.net> has quit IRC | 18:21 | |
JaMa | is it expected that core-image-minimal depends on things like libx11 now? It seems to be caused by systemd -> dbus -> libx11: "dbus.do_package" -> "libx11.do_packagedata" | 18:31 |
*** Bunio_FH <Bunio_FH!~bunio@clj-165.netdrive.pl> has joined #yocto | 18:31 | |
*** Bunio_FH <Bunio_FH!~bunio@clj-165.netdrive.pl> has quit IRC | 18:34 | |
*** goliath <goliath!~goliath@clnet-p04-043.ikbnet.co.at> has joined #yocto | 18:34 | |
mischief | it seems if i use fitimage and also INITRAMFS_IMAGE_BUNDLE my kernel has the initramfs twice. is there a way to prevent that? | 18:39 |
jwessel | RP: I got another hit with more logs and things line up to a lack of understanding how this could happen. | 18:40 |
jwessel | I am not so sure that pseudo is the problem either. | 18:40 |
jwessel | If we assume that pseudo is not doing the inode assignment. | 18:41 |
jwessel | It is asked to instantiate a hard link, but there is no "origination" location. | 18:41 |
jwessel | Example: | 18:42 |
jwessel | (exact 1 d /L/2.28-r0/locale-tree/usr/lib64/locale/ar_JO.ISO-8859-6/LC_NUMERIC) (exact 1 duf /L/2.28-r0/locale-tree/usr/lib64/locale/ar_JO.ISO-8859-6/LC_NUMERIC) | 18:42 |
jwessel | (new?) new link: path /L/2.28-r0/locale-tree/usr/lib64/locale/ar_DZ.ISO-8859-6/LC_NUMERIC.tmp, ino 190901401, mode 0100644 | 18:42 |
jwessel | No origin inode data: 190901401 [ no path ] | 18:42 |
jwessel | I went and poked the actual dir structure, and sure enough. That is the only instance of that inode. It should have been linked off to other locations with the LC_NUMERIC, since they are all the same file. | 18:43 |
jwessel | I don't really understand how that can happen though. | 18:43 |
RP | jwessel: How can you hardlink to something which doesn't have an inode you're linking to? | 18:44 |
jwessel | I have no idea. That is why I put a new log line in there, to prove it was hitting this condition. | 18:44 |
jwessel | I need to trace back to the requestor, since down at the pseudo DB level / server level that information is long gone. | 18:45 |
jwessel | Some how this file was turned into a copy instead of a link. | 18:45 |
RP | jwessel: It at least gives a clue on what we're looking for... | 18:45 |
jwessel | All the failures (I have 3 so far), have the exact same signature. | 18:46 |
jwessel | I thought about putting in clean up code in pseudo to "fix it up" but it clearly isn't the right way to deal with this. | 18:47 |
jwessel | We have a garbage in, garbage out situation. | 18:47 |
jwessel | I just don't understand where the garbage came from. Clearly it is an error which should probably be fatal if you are asked to hard link something for which there is no reference. | 18:48 |
RP | jwessel: right, I think we need to understand it more... | 18:48 |
jwessel | I don't get how it becomes a copy though. | 18:48 |
RP | that is odd... | 18:49 |
jwessel | Just thought I'd provide an update or if you or anyone else has insights, I am open to any input. | 18:49 |
RP | jwessel: I appreciate the update but I'm not close enough to the code to have useful input :( | 18:50 |
RP | jwessel: I do agree its odd though as how would a hardlink become a copy. Maybe a libc call fails and this is the fallback? | 18:50 |
jwessel | I was chatting with marka earlier. He mentioned there is glibc.bbclass or something which deals with some of this. | 18:51 |
*** fray <fray!~fray@kernel.crashing.org> has joined #yocto | 18:51 | |
jwessel | It is specific to how the locales are copied. | 18:51 |
jwessel | I'll look there next. | 18:51 |
jwessel | I am not sure how to track down the caller yet, but we do know that by the time pseudo (the server side) is asked to process the hard link, it is already trashed. | 18:52 |
RP | jwessel: yes, if its being copied in our code then it will try a hardlink, if it fails it will resort to a copy | 18:52 |
jwessel | It is still possible the pseudo client end is broken in some way. | 18:52 |
RP | jwessel: we have a the copy fall back for cases spanning different disks or similar | 18:53 |
RP | jwessel: Its glibc-locale.inc and libc-package.bbclass | 18:55 |
jwessel | I'll have to figure out what to instrument, but I'll start with old fashion code inspection first. | 18:55 |
jwessel | I'd like to see what lines with what prints out in the logs. | 18:55 |
jwessel | hmm... I seem to have stumbled on what happened, but it will take a while to figure out what to instrument next. | 19:05 |
jwessel | https://pastebin.com/DwnTENVU | 19:05 |
*** tgraydon <tgraydon!tgraydon@nat/intel/x-skpphsfafofltkvt> has joined #yocto | 19:05 | |
jwessel | The evidence show the original file was created and reference was purged. | 19:05 |
jwessel | so it became a copy. | 19:05 |
*** asabil <asabil!~asabil@2a01:79d:7375:2ca4:fd3d:19c0:f6c1:5aea> has joined #yocto | 19:08 | |
RP | jwessel: I wonder if some kind of atomic op guarantee on a libc call was broken by pseudo? | 19:12 |
jwessel | Good question. I am inspecting through the pseudo code first, then I need to try and find the caller. | 19:12 |
fray | calls used to be blocking until recently.. but that isn't the issue cause this stuff was broken before that change.. | 19:13 |
jwessel | case OP_MAY_UNLINK: | 19:16 |
jwessel | if (pdb_may_unlink_file(msg, msg->client)) { | 19:16 |
jwessel | /* harmless, but client wants to know so it knows | 19:16 |
jwessel | * whether to follow up... */ | 19:16 |
jwessel | msg->result = RESULT_FAIL; | 19:16 |
jwessel | } | 19:16 |
jwessel | I am not so sure we do the right thing... Or a badly written app might not do the right thing. | 19:16 |
jwessel | More instrumentation is required to find out exactly where we go off the rails. | 19:17 |
*** elGamal <elGamal!~elg@5.253.206.70> has quit IRC | 19:30 | |
*** learningc <learningc!~learningc@121.122.92.78> has joined #yocto | 19:45 | |
*** User_ <User_!~learningc@121.122.92.39> has quit IRC | 19:47 | |
*** vpaladino778 <vpaladino778!18f6646e@access.ips-yes.com> has joined #yocto | 19:51 | |
vpaladino778 | Hey folks. I was given a Poky SDK Installer. I ran it and tried running the 'environment-setup-armv5e-poky-linux-gnueabi' file that it created, but nothing seems to be happening | 19:51 |
Crofton|work | that just sets some environment variables | 19:53 |
vpaladino778 | Do you know how i would compile a program using the Poky SDK that i installed? | 19:54 |
fray | once you source it.. call 'make' or autoconf or.... | 19:57 |
fray | if you just want to compile a single file, $CC -o output input.c | 19:57 |
fray | no magic involved.. standard environment variables are set | 19:57 |
*** asabil <asabil!~asabil@2a01:79d:7375:2ca4:fd3d:19c0:f6c1:5aea> has quit IRC | 19:59 | |
vpaladino778 | So after running 'environment-setup' i can just compile as i normally would and it will compile to the targets platform? | 20:03 |
fray | that is the idea.. just be sure you use the environment variables and not direct calls to 'gcc' | 20:04 |
vpaladino778 | Ahah. I understand. Thank you for your help. | 20:07 |
vpaladino778 | One last question. How can i make sure that i 'use the ennironment variables'? Do i just have to make sure i run the 'environment-setup' file in the current shell session? | 20:08 |
vpaladino778 | Sorry for me naivety. I'm a new-grad and this is all pretty new to me | 20:09 |
JPEW | vpaladino778: Make sure your build system (make, autotools, meson, cmake, whatever) uses them | 20:09 |
JPEW | Including using them yourself if you are your own build system, e.g.: `$CC -o hello -c hello.c` | 20:10 |
*** linuxjacques <linuxjacques!~jacques@nslu2-linux/jacques> has quit IRC | 20:12 | |
*** bluelightning <bluelightning!~paul@pdpc/supporter/professional/bluelightning> has joined #yocto | 20:14 | |
*** leitao <leitao!~leitao@2620:10d:c092:200::1:acf8> has quit IRC | 20:14 | |
*** linuxjacques <linuxjacques!~jacques@nslu2-linux/jacques> has joined #yocto | 20:25 | |
*** otavio <otavio!~otavio@debian/developer/otavio> has joined #yocto | 20:32 | |
*** vpaladino778 <vpaladino778!18f6646e@access.ips-yes.com> has quit IRC | 20:37 | |
*** asabil <asabil!~asabil@2a01:79d:7375:2ca4:fd3d:19c0:f6c1:5aea> has joined #yocto | 20:43 | |
*** behanw <behanw!uid110099@gateway/web/irccloud.com/x-erdyoaqratczusxr> has quit IRC | 20:49 | |
RP | JPEW: Going to try that patch, thanks! | 20:49 |
*** elGamal <elGamal!~elg@5.253.206.78> has joined #yocto | 20:53 | |
*** nabokov <nabokov!~armand@67.218.223.154> has quit IRC | 20:55 | |
*** asabil_ <asabil_!~asabil@2a01:79d:7375:2ca4:195:1cc1:2b31:ba74> has joined #yocto | 21:00 | |
*** asabil <asabil!~asabil@2a01:79d:7375:2ca4:fd3d:19c0:f6c1:5aea> has quit IRC | 21:03 | |
*** behanw <behanw!uid110099@gateway/web/irccloud.com/x-sqfajjwxfzgdvjvg> has joined #yocto | 21:04 | |
JaMa | jwessel: have you seen this? http://git.openembedded.org/openembedded-core-contrib/tree/meta/recipes-devtools/pseudo/pseudo-test.bb?h=jansa/pseudo2 it's definitelly badly written app and breaks pseudo every single time (maybe with different root cause though) | 21:05 |
jwessel | I had not seen that one. | 21:05 |
JaMa | more details in https://bugzilla.yoctoproject.org/show_bug.cgi?id=12434#c48 | 21:06 |
yocti | Bug 12434: normal, Medium+, 2.8 M3, randy.macleod, ACCEPTED , pseudo: Incorrect UID/GID in packaged files | 21:06 |
RP | JPEW: To continue that conversation from earlier, I just dreamt the idea of task specific hosttools using symlinks | 21:10 |
RP | Its so crazy we have to do it... | 21:10 |
jwessel | JaMa: It doesn't seem at first glance that is the same problem. | 21:10 |
RP | JPEW: would solve the task rss contamination problem in theory to a large extent | 21:11 |
*** berton <berton!~berton@181.220.83.67> has quit IRC | 21:17 | |
RP | jwessel: we've had far too many rays of hope with this bug where we think we may have found it, then not.... | 21:24 |
jwessel | I don't doubt that. This is fairly complex. | 21:24 |
jwessel | I see something happening consistently with each build, even in the ones that don't actually fail. | 21:25 |
RP | jwessel: I've suspected/hoped for that. Question is why some fail and some don't | 21:26 |
RP | jwessel: if we could make it 100% failing we'd no doubt quickly figure it out :) | 21:26 |
jwessel | That answer is pretty easy, from the logs. | 21:26 |
jwessel | If I add some sleep into pseudo, it will probably fail every time, but I am not sure. | 21:26 |
jwessel | The logs indicate it is picking up the underlying (what ever was used last) for the inode in the case of the hard link that was turned to a file. | 21:27 |
jwessel | It happens a few times in every single build, but the ones that pass are the ones where has selected UID 0. | 21:27 |
jwessel | It really should be 100% impossible for the hard link to have no source reference. | 21:28 |
jwessel | That is why I put a print log there. I figured it shouldn't be hitting that line of code _ever_ | 21:28 |
JaMa | interesting part for me was that it's consistent when restoring "bad" do_package archive from sstate (so once you create and store bad archive it will always fail as long as you're using the same sstate signature) | 21:29 |
jwessel | I haven't been able to make a shell script which makes it act the same. | 21:29 |
RP | jwessel: so my inode reuse theory is actually right! | 21:29 |
jwessel | Well technically it appears to be the pseudo cache. | 21:29 |
jwessel | The problem is a bit hard to track because each instance involves 3 inodes. | 21:30 |
RP | jwessel: Right, that was what I'd theorised though - that some file known to pseudo was deleted and then a new file was created using the same inode so the permissions/user were copied | 21:30 |
JaMa | and that pseudo writtes a lot of warnings and errors in each build even the ones which ended probably OK :) | 21:31 |
RP | jwessel: it kind of implies we're missing deletion tracking with pseudo which is possible as we remove files out of pseudo context | 21:31 |
jwessel | Is there any way that is happening? | 21:31 |
RP | jwessel: I wish we could do path filtering in pseudo :/ | 21:31 |
RP | jwessel: absolutely certain its happening | 21:31 |
jwessel | It looked to me like the delete was actually there, and it more like there is something happening in parallel. | 21:31 |
RP | jwessel: it was only a theory but I know we do delete files out of context | 21:32 |
RP | (why would we load pseudo and do deletion under it?) | 21:32 |
jwessel | As long as you are sane about it, it should be ok. | 21:32 |
RP | jwessel: what could be interesting would be doing a sanity check of pseudo's db against the disk periodically | 21:33 |
jwessel | I can say with certainty I know how to re-use the inodes. | 21:33 |
jwessel | On a quiet disk I can allocate and re-allocate any which way I want. | 21:33 |
RP | jwessel: I never figured that out :) | 21:34 |
jwessel | But I am trying to limit it down to something reproducible. | 21:34 |
RP | jwessel: right, that is what we need | 21:34 |
jwessel | I can't see at the moment how to get it to the state where the ln proceeds without the whole "load my system and hope it happens..." | 21:34 |
jwessel | At this point I do see the bad state with every single run of the do_package. | 21:35 |
* jwessel has a few more tests cases to try and duplicate it, before the day over for now. | 21:36 | |
* RP is hoping jwessel can figure it out :) | 21:41 | |
jwessel | This might take days. | 21:41 |
*** asabil_ <asabil_!~asabil@2a01:79d:7375:2ca4:195:1cc1:2b31:ba74> has quit IRC | 21:42 | |
jwessel | As with all problems like this, finding the root cause doesn't mean it will be easy to fix. | 21:42 |
RP | JPEW: new server swaps build warnings for failures: https://autobuilder.yoctoproject.org/typhoon/#/builders/15/builds/1171 | 21:43 |
RP | jwessel: I know. I'm dealing with a similar multiheaded monster in the form of runqueue too :/ | 21:44 |
RP | JPEW: {"connections": {"total_time": 130.6708268285729, "max_time": 0.027425227221101522, "num": 485853, "average": 0.0002689513635370635, "stdev": 0.0002935243837515032}, "requests": {"total_time": 824.3801162638701, "max_time": 0.43299578316509724, "num": 485848, "average": 0.0016967860653205739, "stdev": 0.0015868506651319188}} | 21:46 |
JPEW | RP: hmm... did you remove the io and sql stats? | 21:47 |
RP | JPEW: no? | 21:47 |
RP | JPEW: oh, I think I've missed a commit | 21:48 |
JPEW | Ah.... I'm actually impressed that applies cleanly :) | 21:48 |
RP | JPEW: yes, so am I looking at it! | 21:49 |
JPEW | RP: Git is a mysterious entity | 21:49 |
RP | JPEW: would that cause the connection reset issue? | 21:49 |
JPEW | Possibly? | 21:50 |
JPEW | I have been running bitbake-selftest hashserv.tests on my changes | 21:50 |
JPEW | If the server was getting a python exception in the thread I think that would cause a connection reset | 21:51 |
RP | JPEW: that was what I was just thinking looking at the missing import | 21:52 |
JaMa | RP: time bitbake world -P -n is still running after over 5 hours not yet executing tasks (on older bitbake) so it might be even slower than 253min run done earlier today on the same build | 21:53 |
*** georgem_home <georgem_home!uid210681@gateway/web/irccloud.com/x-ubcroiigjwhpfzrj> has joined #yocto | 21:56 | |
RP | JaMa: That is just crazy :( | 21:57 |
RP | JaMa: I need to figure out what is wrong. I think my more limited world build should give me the baseline profile data I need... | 21:57 |
RP | JPEW: I've sorted out the patches and restarted everything, lets try again | 21:58 |
JPEW | Cool | 21:58 |
* RP knows he's going to run out of time on this soon :( | 21:58 | |
RP | JPEW: https://autobuilder.yoctoproject.org/typhoon/#/builders/48/builds/938 :( | 22:06 |
JPEW | RP: Argh.... OK. | 22:08 |
RP | I'll try and keep the logging but revert back to the previous threading | 22:09 |
JPEW | RP: OK. I have to go home, but I'll give it a think. | 22:09 |
RP | JPEW: thanks, this was a good try :) | 22:09 |
*** agust <agust!~agust@p54833DBB.dip0.t-ipconnect.de> has quit IRC | 22:10 | |
RP | JPEW: I'm just reverting so I can test the other changes in -next which are a mix of normal patches and runqueue fixes | 22:10 |
jwessel | So, in a non-failed build I absolutely have the same problem each time, as I mentioned before. | 22:20 |
jwessel | I see how it happens, but still don't know the root cause. | 22:21 |
jwessel | There are is pile of these requests coming in, parallel. And the mv bit is not atomic. | 22:22 |
jwessel | The unlink occurs before the rename | 22:23 |
jwessel | I haven't determined that the problem is actually the fault of pseudo or not. | 22:23 |
jwessel | I don't understand how all these db requests are coming in, in parallel. | 22:23 |
jwessel | BTW, if we were to add a QA check against the files we know are supposed to all be hard linked together. It would fail 100% of the time. | 22:25 |
jwessel | We get a couple strays that became hard copies each time. | 22:25 |
*** BCMM <BCMM!~BCMM@unaffiliated/bcmm> has quit IRC | 22:26 | |
RP | jwessel: That sounds like a good way to debug this if we have a sentinel we can spot.... | 22:26 |
jwessel | By luck of the draw most of the time it has not failed due to the QA check being for the builder UID. | 22:26 |
RP | jwessel: most files we delete would be owned by "root", not the build user | 22:26 |
jwessel | It is going to be a different file every time, but the investigation thus far shows 100% failure rate. | 22:26 |
RP | right statistically we process enough files something breaks | 22:27 |
jwessel | I really need to have a hard look at the operations. | 22:27 |
jwessel | But that is it for now. | 22:27 |
jwessel | For today anyway. | 22:27 |
RP | jwessel: its great progress as I can think of ways of debugging this further from there! :) | 22:28 |
*** neverpanic <neverpanic!~clemens@towel.neverpanic.de> has quit IRC | 22:30 | |
*** neverpanic <neverpanic!~clemens@towel.neverpanic.de> has joined #yocto | 22:32 | |
*** tijko <tijko!~tijko@unaffiliated/tijko> has quit IRC | 22:40 | |
RP | kanavin, JaMa: the notifications are about the generation of task specific profiles | 22:42 |
RP | JPEW: managed another connection reset without the threading patch :( | 22:46 |
RP | definitely not as common | 22:46 |
JaMa | RP: I'm getting close with that profile, not on task 50K from 62K | 22:54 |
JaMa | now when some tasks are "running" there is some load on the worker process as well | 22:56 |
RP | JaMa: that makes sense since I guess its instructed to fork() | 22:57 |
*** vineela <vineela!vtummala@nat/intel/x-hjjhjvtldyngfbkt> has quit IRC | 23:10 | |
JaMa | heh, it finished.. *a lot* of profile* files in this directory :), will upload in a sec | 23:10 |
JaMa | http://paste.ubuntu.com/p/Nw6n5hTPmF/ http://paste.ubuntu.com/p/sydFGvYTxQ/ http://paste.ubuntu.com/p/RMGbm9DyZz/ | 23:16 |
JaMa | the first is the "short 5 min" build til "Executing Tasks", then the profile.log.processed and last is from profile-worker.log.processed | 23:17 |
JaMa | all 3 with bitbake 1f630fdf0260db08541d3ca9f25f852931c19905 (before most runqueue changes) | 23:17 |
JaMa | maybe the logger.debug in scenequeue_updatecounters()? | 23:35 |
*** goliath <goliath!~goliath@clnet-p04-043.ikbnet.co.at> has quit IRC | 23:38 | |
*** florian <florian!~florian_k@Maemo/community/contributor/florian> has quit IRC | 23:56 |
Generated by irclog2html.py 4.0.0 by Marius Gedminas - find it at https://mg.pov.lt/irclog2html/!