Page 1 of 1

Database search anomaly

Posted: Sat Jul 16, 2016 10:23 am
by matesic
Searching for "Publications in 1916", I was wondering why certain pages failed to appear and discovered that whereas a page in which the "First publication" field contains only the date appears correctly in the search, another in which the date is followed by the name of the publisher fails to appear. This seems very unfortunate! Might not the search criteria be modified so as to embrace such variations, which can't be uncommon?

Re: Database search anomaly

Posted: Sat Jul 16, 2016 10:59 pm
by cypressdome
Are you talking about categories such as Scores published in 1916 and all the other "Scores published in YYYY" categories? If so, pages are included in those categories through the use of the P template in the Publisher Information field for individual scores. The date and other information in the Year of First Publication field in the Work Info area has no affect on it.

Re: Database search anomaly

Posted: Tue Jul 19, 2016 6:42 pm
by matesic
The specific pages that illustrate my problem are 1. String Quartet No.8 by J.B.McEwen; this appears under "Scores published in 1916"; 2. String Quartet No.5 by W.H.Reed, which doesn't. The only difference I could detect which might be responsible for this are the entries under "General Information; First publication" which I now see should contain just the date but for Reed contained not only "1916" but also the city and name of the publisher (yes, it was me that made the entry!). I've now edited the field to contain just the date, but apparently to no effect.

Re: Database search anomaly

Posted: Thu Jul 21, 2016 9:26 am
by matesic
OK, now I see it. The Publisher Information field for the Reed quartet contained the word "London" twice (London: London: Cary & Co., 1916. Plate C.B. 122.- amazing how the eye can miss that! The same field for the McEwen quartet reads London: Cary & Co., 1916. Plate C.B.126.
On the edit page the Reed entry was |Publisher Information=London: [[Cary & Co.]], 1916. Plate C.B.122. while the McEwen entry is quite different |Publisher Information={{P|Cary & Co.||London||1916||C.B. 126}}

My mistake for sure (now corrected), but how many others have made a similar goof, and shouldn't it have been automatically detected and corrected? If the database isn't idiot-proof, how far can it be trusted? I was hoping to show how the number of music publications in 1916 slumped to a 50-year low, making a pretty dramatic correlate of the end of romanticism (incidentally, the style of the Reed quartet I described as "Early 20th century", while the McEwen is "Romantic"!) but now I'm statistically holed below the waterline.

Re: Database search anomaly

Posted: Fri Jul 22, 2016 9:26 am
by matesic
The anomaly looks as serious in its implications as I feared it might be. Amongst 5 other 20th century original scores that I scanned and uploaded myself (first imslp entry of each piece) I find 4 of them don't appear when I search for "Scores published in 19**". It looks like the P template may be sensitive to the precise order or format of the information entered in the Publisher Information field (date, city, publisher, plate no. etc). As a second contrasting example, the unfortunate administrator responsible for maintaining the database (?cypressdome, still on vacation in the Florida woods?) might take a look at the 5th and 6th violin sonatas of Francis J.Morgan. Although the work pages look correct enough, on the editing pages it will be seen that the contents of the Publisher Information Field is quite differently formatted in each case. At least I got it right once, but surely I can't be the only contributor to have inadvertently corrupted the database!

Re: Database search anomaly

Posted: Fri Jul 22, 2016 1:30 pm
by matesic
Finally, for my own peace of mind I selected 25 string quartet pieces from the database, none of them with the work page initially created by me but all with an unambiguous date between 1916 and 1924 in the Publishing Information field. Only 11 of them came up in a "Published in XXXX" search.

Re: Database search anomaly

Posted: Tue Jul 26, 2016 3:42 am
by cypressdome
Sorry for my delay in responding. The P template was not created until 2011 and did not really get used much until around 2013 so it is very likely that it has not been applied to the vast majority of scores on IMSLP. The only way a work page will get listed in one of the "Scores published in YYYY" categories is if the P template is used in the Publisher Information field for the posted score. Using it is a manual process--nothing will auto-populate it during the score upload process. I know of several frequent contributors who use it but utilizing the template would probably be quite burdensome for the casual uploader. Several of us who do daily work on the wiki update pages to include the P template as we clean up pages. As you can imagine it's a slow process. Even so, I see that there are over 26,000 pages using the P template so I suppose we are making some headway.

Re: Database search anomaly

Posted: Tue Jul 26, 2016 6:28 am
by matesic
Thank you - this confirms what I've slowly come to understand. Unfortunately I think it's going to be a long long time before queries that involve the P template access the entire database, so maybe there should be a warning that whatever data comes up under "Scores published in ..." is drastically incomplete? In the meantime, since a year-by-year breakdown would be a useful feature (to me at least), would it be possible to construct a simple query of each entry in the Publisher Information and/or First Publication field to detect the year?