2011-10-4

Doing Science in the Open

Motivation

Being a scientist brings not only fun to investigate things you care about, freedom, and a decent salary. There are also some responsibilities. Money poured into research comes from the public and ends up in the pockets of private publishers that are highly profitable (think Elsevier) and limit who can access the findings. Why so? Shouldn’t we be the first ones to make science available: anywhere, anytime, for everybody?

This principle comprises Open Science. In this post, I argue that there is a lack of sharing, collaboration, and openness in science, and to support my claim, I will go over the typical scientific workflow, analyze current problems and propose ways to improve. Some of my suggestions are not ready for their prime time yet; however, a number of them can successfully be implemented today to the benefit of many.

Such openness and sharing is heavily inspired by Michael Nielsen who has recently published a book Reinventing Discovery: The New Era of Networked Science where such problems are discussed in detail. My goal is rather to provide practical solutions for scientists working in vision research (but hopefully useful for others too).

Our research habits

Come up with an experiment. You need to read journals to get inspired / replicate findings. What if you can’t access some of them? What if your institution is poor? What if you’re just a curious high school student?

These questions raise the fundamental flaws in our current way of doing science. The next steps will help to answer them.

Write code to run it (MATLAB, E-Prime). What if you want to reuse somebody’s code? How do you obtain it? Will it work on your machine? What if you don’t have the software?  Because it is expensive — MATLAB: € 360 / year (?);  E-Prime: $ 795 / $ 995,  and, oh,  Windows: € 200 / € 309 / € 319.

While this reasoning might sound unimpressive — after all, who doesn’t have MATLAB these days? — there are examples of situations where it did become a problem. Zdenek Kalal from the University of Surrey (UK), for example, has developed an impressive computer model for object tracking, known as Predator. It received lots of media attention and thus kids all over the world got interested in trying it on their own. Kalal released the source code (as Open TLD) which was written in MATLAB. The third comment (!) in the discussion group already mentions porting it to C/C++, and the fourth complains about not having MATLAB to test it out. And thus, the coming months were spent by the community porting it to C++ rather than playing around and making the use of it.

Why C++ now? Because it is a free and open source software (FOSS). If you use free and open source solutions, you can rest assured that everybody can use your code — and, crucially, for free.

For our particular needs, we can code experiments in Python. It is user-friendlyfast, has psychophysics packages (PsychoPyVision Egg) and even point-and-click interfaces (PsychoPy, OpenSesame).  Due to its huge popularity, Python is also extremely versatile:

  • got some C/C++ code? use Cython
  • want to go online? use PyjamasSculpt
  • need parallel computing? use IPython
  • need more? somebody has probably done that already!

Furthermore, Open-Source MATLAB-to-Python Compiler has made it simple to move your existing scripts to MATLAB.

But there is no need to limit yourself to just a FOSS programing language. In fact, all necessary tools for research (described in this post) are conveniently packaged in a FOSS operating system NeuroDebian, maintained actively by Hanke/Halchenko’s team in Jim Haxby’s group.

Collect data. Where do you get enough participants? Oh, and that’s expensive too!

The internet is a great resource for large numbers of cheap participants. A number of experiments that do not require specific conditions or equipment can be partly or fully conducted online.  As mentioned in the previous section, Python can in principle be used to program such online experiments, though it is not straightforward at the moment.

But there is more. Why not make your experiments run in a browser, regardless whether or not you will be running them online? When you publish your findings, people will be able to do your experiment for real — and, as a result, get a better feeling of what it was like. Maybe they would unveil some problems with your design? Or maybe have great ideas how to improve or use it for a follow-up? In any case, experiencing the experiment first hand is much more informative than its descriptions in a paper.

Finally, if you decide to go online, make your experiments fun! You will not only attract more participants or get away by not even paying them. Making experiments game-like can really help to crack difficult problems. For example, foldit gaming platform announced having resolved a crystal structure of an AIDS-related protein which has been bothering scientists for some 15 years — and all of that in 10 days, and all purely due to gamers.

Analyze data (MATLAB, Excel, SPSS). How much time do you spend analyzing it manually? Programming trivial things?

First of all, doing it manually on Excel is likely not efficient nor error-prone.

Now, if you write some code to analyze your data, publish your computer code: it is good enough – even if you are not good at programming and don’t comment your code. It will definitely benefit people who are trying to replicate your analyses for their data.

You can also help yourself and others by organizing your code using distributed version control systems like git or mercurial. They allow you to track changes you make while coding (and always rewind back if you need!), while forcing you to comment your changes. Another important feature is seamless merging of various versions in your code. A standard situation where this is useful is when you code both at home and at work. Instead of carrying files back and forth and thinking what needs to be updated and what not, you simply submit your code to the online distributed version control system and retrieve it on another computer that you use. The systems merges changes automatically, you don’t have to thing much. Bonus: your code is online, available for an easy sharing.

In order to learn how to use these tools, Software Carpentry is an excellent resource.

How much do you pay for your analysis software? MS Office: € 139 (student) / € 379.01 / € 699 (pro), SPSS: € 25 (student/year) / € 360

You should first ask yourself what kinds of analyses you tipically do. If all you care about is a t-test, linear regression, and a correlation, then it may be more convenient to do it in Python already. For more advanced statistical tests, use R.

For fMRI analyses, there is NiPy for Python users and AFNI and FSL for everybody. For multi-voxel pattern analyses, you can use PyMVPA or LIBSVM. Bottom line is that these tools are powerful enough to offer you one-line solutions to most of your analysis needs (e.g., search light).

Present data at a conference (Powerpoint, Illustrator). Will you ever see anybody’s slides/poster again?

For an immediate dissemination, use QR codes (using, for examples, this QR code generator) which can be easily put on your slides or posters and scanned by a smartphone/tablet.

Another simple solution is to put them on your lab’s website (in our lab, we even have a dedicated website called Gestalt Revision).

Or how about your own website? If you think it’s difficult to implement and you don’t have a clue of where to start with HTML, you’re so 2000s. Nowadays personal websites are known as blogs and are trivial to set up and to even look like a proper (ugly) lab website, as does mine. Check out WordPress or Blogger for the most popular solutions. As a bonus, having your own website allows you to know who cares about your stuff because most blogs keep track of user statistics.

Finally, why use Powerpoint to make your slides/poster? ScribusInkscape, or Beamer are all decent FOSS solutions.

Write a paper (MS Word). Ever got lost among all different versions of your paper?

When a number of authors collaborate on a paper, inevitably a number of different manuscript versions get emailed and start accumulating on your hard drive. This can easily turn into a mess and also must be organized in a serial fashion (first me, then my prof, then another prof, then me…) in order not to get lost across all the changes. Instead, Google Docs offers an online version of a word processor where you can revise, comment, see everybody’s changes and always go back to an earlier version. Thus, there is a single “master” copy that everybody can work on. Furthermore, it lives online, so you can access it anywhere.

Hate formatting?

A Word document is not a professionally looking journal article, unless you put a substantial amount of effort. If you care about generating papers that are well-structured and good-looking, LaTeX is the strongest (and FOSS) alternative. LaTeX is widely used in hard science and by publishers as it makes writing equations a snap. While writing text in LaTeX is not as straightforward as in Word (it’s not WYSISYG), LyX comes close. There are also online implementations of LaTeX for collaboration, for example, ScribTeX.

Submit it to a journal. How much time do you spend preparing your manuscript for submission? Figuring out journal’s particular requirements? Redoing all of that for another journal? How much do you pay for submission/publishing? For example, Journal of Vision: $ 85 per page ($ 510 for 6 pages),  The Journal of Neuroscience: $ 950 / $ 475 (brief), PLoS One: $ 1350, Psychological Science: $ 0.

If high fees bother you, ask why do we need journals at all? What do they do for us? Some copy-editing, text-formating and putting that on paper (which is, arguably, a waste) and we pay exorbitant fees for that? Also, don’t forget that libraries across the world are paying to access the journals, so the publishers benefit that way too (thus a $0 price tag on Psychological Science). Surely, some use the money to simply remain in business (although again, where does $1350 go to in PLoS One?). But many are only concerned about their profits — because it’s a business, after all.

In this light, arXiv seems to me like a much more liberal publishing platform. Anybody can publish there, formatting is done by LaTeX (as arXiv focuses on hard science), no fees involved. This platform is highly popular in hard science and, in some cases, even the best publications on arXiv never end up in regular journals because arXiv is just good enough. A note of caution: some journals tend to treat publishing on arXiv as a proper publication and subsequently will not consider your manuscript for publication in their journal anymore.

Of course, arguably the most important publishers contribution comes from managing the peer-review system. They oversee its proper implementation and that is important. Surely, there are other ways to go about it and some ideas are presented at the next point.

Get reviews*. Stupid reviewers and unreasonable requests? 3 major revisions? How many years does it take from obtaining results to publishing them?

*negative

This is a very broad topic which I am not very familiar with, but it seems to me that publishing peer review along with the publication without any names attached could go a long way already. Some people, like Nikolaus Kriegeskorte, are arguing for an Open access + Open post-publication peer review model. Simply put, you publish on something like arXiv and people freely read it, comment on it and rate the paper. The best one could even be picked up by regular journals and published there. To make sure constructive feedback is given, real names could be used, people could gain or lose points according to whether other people think their comments were useful or not, in a similar way that forums are organized.

Revise and get paper published. Who can read your paper? Did you know that you did not hold the copyright to your own creative work? And what did you publish? Just some text? Where is raw data, code, analyses? Maybe you made a mistake? Maybe you cheated?

The answer is simple — publish everything: data, analysis workflow, code that generated your figures, full text (after the paper gets published, and put a disclaimer like this one to deal with copyrights)…

Get media coverage. Will they see your paper? Will anybody see your paper? And if they do, what will they find on your website? A conveniently double-spaced paper in the Times New Roman typeface?

There is no problem if you used Google Docs or LaTeX and created your website… Everything neccessary is easily available online already.In order to facilitate laymen, consider providing a simple explanation, just like it is done on the Gallant Lab website. It is also a nice way of showing your participants where all their efforts went to.

Concluding remarks

Moving to the cloud. The framework provided here can be summarized as “moving to the cloud!” By maintaining everything online you:

  • no longer depend on a particular platform
  • get free backup
  • can access to everything for everyone everywhere (use Dropbox, Wuala)
  • can instantly collaborate, e.g., via GMail-GTalk-Google Docs

Open issues:

  • Where do you share your data? Maybe Dryad, Dataverse Network
  • No facebook for scientists… Maybe Academia.edu, Research Gate will catch on
  • There aren’t mature open alternatives for everything yet, e.g., Powerpoint, SPM
  • Privacy issues when moving to the cloud
  • Requires efforts: more coding, more command line interfaces, more things to do

What to do. There is a lot that can be done and that may be overwhelming. Don’t stress. Simply start by publishing full data, your code, and full text. It’ll do plenty.

See also: slides from a presentation at a lab meeting

2011-09-20

Emergence of perceptual Gestalts in the human visual cortex: The case of the configural superiority effect

Jonas Kubilius, Johan Wagemans, & Hans P. Op de Beeck

Many Gestalt phenomena have been described in terms of perception of a whole being not equal to the sum of its parts. It is unclear how these phenomena emerge in the brain. We used functional MRI to study the neural basis of the behavioral configural-superiority effect (i.e., visual search is more efficient when an odd element is part of a configuration than when it is presented by itself). We found that searching for the odd element in a display of four line segments (parts) was facilitated by adding two additional line segments to each of them (creating whole shapes). Functional MRI–based decoding of neural responses to the position of the odd element revealed a neural configural-superiority effect in shape-selective regions but not in low-level retinotopic areas, where decoding of parts was more pronounced. These results show how at least some Gestalt phenomena in vision emerge only at the higher stages of visual information processing and suggest that feed-forward processing might be sufficient to produce such phenomena.

journal link | source code | poster at SfN 2010 | poster at ECVP 2011 | entry at gestalt revision

2011-09-14

Vilniaus jėzuitų gimnazija: Kažkiek apie smegenų tyrimus

Populiariai ir lengvai apie smegenų tyrimus iš įvairių kampų bei mano “kelią į mokslą”. Bus eksperimentų ir klausimų/atsakymų!

Vieta: Vilniaus jėzuitų gimnazija, Aktų salė
Data: rugsėjo 16 d., 12 val.
Trukmė: 1 val.
Skaidrės: Kažkiek apie smegenų tyrimus

Tags:
2011-08-29

Emergence of perceptual Gestalts in the human visual cortex: The case of the configural superiority effect

Jonas Kubilius, Johan Wagemans, & Hans P. Op De Beeck

Many Gestalt phenomena have been described in terms of perception of a whole being not equal to the mere sum of its parts. It is unclear how these phenomena emerge in the brain. We used functional magnetic resonance imaging (fMRI) to study the neural basis of the behavioral configural superiority effect, where a visual search task for the odd element in a display of four line segments (parts) is facilitated by adding an irrelevant corner to each of the line segments (whole shapes). To assess part-whole encoding in early and higher visual areas, we compared multi-voxel pattern analysis performance on detection of the odd element. Our analysis revealed a neural configural superiority effect in shape-selective regions but not in low-level retinotopic areas, where decoding of parts was more pronounced. Moreover, training pattern classifiers to the whole shape and attempting to decode parts failed in the most anterior region of these shape-selective regions, suggesting a complete absence of part information in the pattern of response. These results show how at least some Gestalt phenomena in vision emerge only at the higher stages of the visual information processing and suggest that feedforward processing might be sufficient to produce them.

This poster received The Best Student Poster Award (six recipients in total).

pdf (page size)svg (original), F1000 Posters

Tags:
2011-08-24

NMA 2011 m. vasaros sesija

Renginys: Nacionalinės moksleivių akademijos vasaros sesija
Vieta: Nidos vidurinė mokykla
Data: 2011 m. rupgpjūčio 16-18 d.
Pastaba: Dėl autorių teisių suvaržymų, skaidrės pateikiamos tik švietimo tikslais, išskyrus kai nurodyta kitaip ant pačių skaidrių

read more »

Tags:
2011-08-3

Mirror-image sensitivity and invariance in object and scene processing pathways

Daniel D. Dilks, Joshua B. Julian, Jonas Kubilius, Elizabeth S. Spelke, & Nancy Kanwisher 

Electrophysiological and behavioral studies in many species have demonstrated mirror-image confusion for objects, perhaps because many objects are vertically symmetric (e.g., a cup is the same cup when seen in left or right profile). In contrast, the navigability of a scene changes when it is mirror reversed, and behavioral studies reveal high sensitivity to this change. Thus, we predicted that representations in object-selective cortex will be unaffected by mirror reversals, whereas representations in scene-selective cortex will be sensitive to such reversals. To test this hypothesis, we ran an event-related functional magnetic resonance imaging adaptation experiment in human adults. Consistent with our prediction, we found tolerance to mirror reversals in one object-selective region, the posterior fusiform sulcus, and a strong sensitivity to these reversals in two scene-selective regions, the transverse occipital sulcus and the retrosplenial complex. However, a more posterior object-selective region, the lateral occipital sulcus, showed sensitivity to mirror reversals, suggesting that the sense information that distinguishes mirror images is represented at earlier stages in the object-processing hierarchy. Moreover, one scene-selective region (the parahippocampal place area or PPA) was tolerant to mirror reversals. This last finding challenges the hypothesis that the PPA is involved in navigation and reorientation and suggests instead that scenes, like objects, are processed by distinct pathways guiding recognition and action.

journal link, on MITnews

read more »

2011-08-2

LJMS vasara 2011

Renginys: Lietuvos jaunųjų mokslininkų sąjungos vasaros stovykla
Vieta: Molėtų astronomijos observatorija
Data: liepos 31 d., 17 val.
Skaidrės: Regos sistemos modeliavimas

Atsakymai į paskaitos metu neatsakytus klausimus

Perceptronas: kam reikalinga g(x) (25-26 skaidrės)? Gal užtektų tiesiog g(x) = x?

Yra kelios priežastys naudoti sudėtingesnę funkciją. Iš vienos pusės, neuroniniuose tinkluose kažkiek bandoma išlaikyti panašumą į biologinius neuroninius tinklus. O neuronai turi tam tikras ribas, kiek kartų per sekundę jie gali perduoti impulsą. Jei g(x) būtų lygu x, tada jokių tokių ribų nebūtų. Todėl praktikoje naudojama, pavyzdžiui, Heaviside step funkcija arba sigmoidinė funkcija, kurios turi viršutinę ribą. Iš kitos pusės, jeigu leisim išvesčiai augti be apribojimų, tai dar gali būti, kad visas neuroninis tinklas taps nestabilus ir nepavyks jo ištreniruoti (pasiekti lokalaus minimumo).

Neurono selektyvumo pokytis (48 skaidrė) — kodėl neurono selektyvumas sumažėja P stimului (šuniukui) ir išauga N stimului (raganosiui)? Juk jeigu P ir N stimulai rodomi greta vienas kito laike, neuronas turėtų išlikti selektyvus P stimului ir, be to, tapti selektyvus N stimului.

Atkreipkime dėmesį į eksperimento dizainą (47 skaidrė, B). Neuronui parenkami du stimulai: P (prefered) stimulas (šuniukas), į kurį neuronas reaguoja aktyviau nei į N (non-prefered) stimulą (raganosį). Tada P stimulas asocijuojamas laike su dvigubai didesniu N stimulu, ir atvirkščiai. Vadinasi, neuronas išmoksta traktuoti vidutinio dydžio P stimulą ir didelį N stimulą kaip tą patį objektą.  Testuojant naudojami tik pastarieji, dvigubai didesni stimulai. Ko galima tikėtis? Kad neuronas, kuris šiaip jau mėgsta P stimulus, dabar aktyviai reaguos ir į didelį N stimulą. Taip pat — kad tas pats neuronas, šiaip jau nemėgstantis N stimulų, ims menkiau reaguoti ir į didelį P stimulą.
Tags:
2011-04-28

VU FsF 2011 m. pavasaris

Renginys: Pažinimo psichologija I: įvadas, pojūčiai, suvokimas, dėmesys
Vieta: Vilniaus universiteto Filosofijos fakultetas
Data: balandžio 27 d., 15 val.; gegužės 16 d., 11 val.
Skaidrės: Mažutėliai Geštalto trupiniai

Tags:
2010-11-15

Feedforward emergence of perceptual gestalts in the human visual cortex

Jonas Kubilius, Johan Wagemans, & Hans P. Op De Beeck

How does our visual system combine the features or parts in a complex display to provide a percept of a whole? Intense behavioral work in Gestalt psychology has described a number of phenomena where this whole is not equal to a mere sum of its parts. However, precise unifying principles of such Gestalt effects have remained elusive. Current neuroscientific models suggest that these phenomena are the consequence of the interplay between bottom-up, feedback, and lateral connections in the hierarchical visual system. Here we approach this question from a different perspective by investigating the neural basis of Gestalt formation. In our functional magnetic resonance imaging (fMRI) paradigm, we utilized a configural superiority effect, where a visual search task for the odd element in a display of three lines oriented at 45 degrees and one at 135 degrees (parts) is facilitated by adding an irrelevant corner to each of the lines, forming three arrows and a triangle (whole shapes). To assess the extent of grouping in early and higher visual areas, we compared multivariate pattern analysis performance on the same task, detection of the odd element, using the fMRI activity pattern in the primary visual cortex (V1) and shape-selective lateral occipital complex (LOC). The behavioral advantage for searching among shapes rather than lines was reflected in a better classification performance in LOC but not in V1, where decoding of parts was more pronounced. This suggests that the configural superiority effect may arise by predominantly feedforward processes. Moreover, even when we trained the classifier on whole shapes in LOC, its decoding performance on parts remained as poor as when we trained on those parts, indicating that the representation of a whole shape bears no similarity to the representation of its parts at these higher stages of visual processing. Simulations confirm that these findings are consistent with a feedforward model of vision, HMAX. Taken together, these results show how at least some Gestalt phenomena in vision are consistent with and caused by the feedforward processing of visual shape.

pdf (page size)svg (original)

Tags:
2010-09-13

Tyrėjų naktis 2010: Žmogaus regos sistema

Retai kada klausi savęs: o kaipgi aš matau visus šituos daiktus priešais save? Šis iš pirmo žvilgsnio elementarus klausimas tuoj komplikuojasi, kai pabandai priversti, pavyzdžiui, kompiuterį suvokti, ką jis mato. Paskaitoje papasakosiu apie savo ir kitų mokslininkų bandymus tirti regos veikimo principus. Nenuobodžiausime: rodysiu optinių iliuzijų, pademonstruosiu dirbtinius neuroninius tinklus ir išvis draugiškai diskutuosime apie smegenų tyrimus.

Renginys: „Tyrėjų naktis 2010“
Vieta: Vilniaus universiteto Filosofijos fakultetas
Data: rugsėjo 24 d., 17 val.
Skaidrės: Žmogaus regos sistema

Follow

Get every new post delivered to your Inbox.