<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://evcu.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://evcu.github.io/" rel="alternate" type="text/html" /><updated>2026-05-17T00:35:49-04:00</updated><id>https://evcu.github.io/feed.xml</id><title type="html">evcu</title><subtitle>Personal webpage of a guy from Turkey named Utku Evci. Recently @Montreal, 2018 Google AI Residency Program.</subtitle><author><name>Utku Evci</name><email>ue[224+1]@nyu.edu</email></author><entry><title type="html">Coincidence or Destiny</title><link href="https://evcu.github.io/personal/canmore/" rel="alternate" type="text/html" title="Coincidence or Destiny" /><published>2026-03-24T00:00:00-04:00</published><updated>2026-03-24T00:00:00-04:00</updated><id>https://evcu.github.io/personal/canmore</id><content type="html" xml:base="https://evcu.github.io/personal/canmore/"><![CDATA[<p>A sunny, tipsy late March afternoon, wanted to start this post. Happy hours grew in me since I moved to Canmore, Alberta. And maybe that’s the culprit.</p>

<p>Regardless it’s peak March in the Bow Valley. Snow is receding in the mountains and parts of the valley trails. It’s still winter, but the daylight and the lack of snow in the valley tells you summer is around the corner; no spring around here, really.</p>

<p>I heard about Banff from an Australian friend when I was studying in NYC I guess. Oo Australians know how to travel. Regardless it is hard to miss when you live in Canada, especially if you are a person who likes the views and the trails. Still I only traveled here after spending 2 years in Montreal and maybe only because it was the Covid summer. It was kind of not possible to leave country months after Covid started and a natural destination was the Canadian rockies. So was the first visit. Instagram shows my first post from Banff, the tunnel mountain road on 26th June of 2020. A classic <a href="https://youtu.be/-IXDwrfKCaM">le Tour de Bow</a>, Alymer Lookout, Stanley Glacier, Lake Louise, Moraine Lake, Marble Canyon, Wapta Falls, Icefields, Jasper… 9 days it is. Then the <a href="https://www.utkuevci.com/personal/covid-year/">unexpected covid year</a> followed by “Canada calling for PR”. June 2021, I was in Alberta again. This time venturing into BC: Revelstoke, Nelson and Fernie: extending the horizon <a href="https://www.youtube.com/watch?v=UtLzJmxAfFU">a bit more</a>: Valhalla, Bugaboos and Assiniboine for the first time, most of it w/ L.!</p>

<p>It was the Mountain Collective that I bought in 2022-2023 winter that brought me to <a href="https://www.youtube.com/watch?v=6Do0tJODM5o">Bow Valley in the winter</a>, Islandman in the background. March 5th, the instagram says, I was at the Sunshine Village and looks like I stayed about 14 days here; perfect timing. The beauty of this place hit me differently in winter and I remember saying and dreaming in Cammore:</p>

<p>“How would it feel like to live here and would I ever get used these views?”</p>

<p>It was a brave new world during Covid and dreams ran “wild”. L. was with me looking into my eyes with excitement.</p>

<p>Well, that excitement didn’t last for that much longer. She wanted a baby and I was feeling like a fulfilling life together didn’t require that. That led to the decision of separation. Meaning no more the mighty old-port loft, a symbol of fun Montreal life, we were enjoying. It was time to close our house. I was also scared of what it might felt like staying in Montreal and living how life would go without L. We worked at the same office and had many common friends. I decided to leave. But I needed to stay in Canada for few more years and be done with having a first-world passport for good. It was also easy to go fully-remote in 2023 spring. Weeks before the fully-remote bus left the last stop, I hopped into it. It was time to make a dream come true and move to the Bow Valley.</p>

<p>In hingsight it feels like it should have been an easy choice, but nothing about that was easy. After spending 5 years, I finally had a community of friends in Montreal. Throwing parties outside and having a descent support network. Enjoying and improving my dj-skills. I also knew the city and had a car and was ready to buy a home. Think of going across the country where you knew “no-one”. Giving that up. No, it wasn’t easy.</p>

<p>Packing and moving happened in good moods. So was the cross-country 6 day trip from Montreal to Calgary. It felt so liberating to move your home to the west, like many families did over so many years. Seeing the great lakes and the great plains. So was the month I spent in hostels and camping grouds, trying to find a house. Did many fun hikes, started to <a href="https://www.youtube.com/watch?v=6Y7XvtJkw6U">explore the peaks around</a>.</p>

<p><img src="/assets/images/canmore/temple23.png" alt="jacob" /></p>

<p>I had few friends visiting early on (Kelvin, Jacob). It was nice to have them; I was pretty lonely to be frank. I tried to continue djing and trying to be social, but not much reward. I got a place with a beatuful view of Grotto, like I dreamt earlier in March. Working with views. Honestly the views lasted few months. You don’t eat all the time. Similarly you don’t need to see it all the time.</p>

<p><img src="/assets/images/canmore/jacop.png" alt="jacob" /></p>

<p>Learning new things like backcountry skiing, climbing, mountaneering felt much better. After all if you are not doing those why are you living in the mountains? Why try to go to beach if you are in Canada? Finally I got to love winters: improved my skiing and winter knowledge through many courses like: AST-1, AST-2, Ski Mountaineering, Cravesce Rescue. Did quite a bit backcountry and resort ski days. Tried skate skiing and did plenty of winter running. Ultimate canadian boot-camp.</p>

<p><img src="/assets/images/canmore/ast2.png" alt="ast2" /></p>

<p>Winters can be as much fun as summers, if you know what to do. Nothing matches walking on your own, in winter, around Assiniboine; setting your track and it dissapears. Crazy how much I did and learned during the 2023-2024 winter. I didn’t really socialize much, nor did djing and partying. It was just the play in the great wilderness and work and sleep. And maybe all of this originated from the loneliness and to prove myself that I can have fun and grow despite all. Went to Assiniboine Naiset Huts on my own. It was a moving experience. In 3-4 months getting comfortable enough to go into deep winter and walk around Assiniboine. Such a special place. Sking out on my own, had a small accident where I pulled my calves at the Assiniboine pass. Luckily I didn’t became that guy who needed rescue; just a painful 20km walk. This trip was so liberating, maybe even phoenix waking.</p>

<p><img src="/assets/images/canmore/assiniboine24.png" alt="assiniboine24" /></p>

<p>Then things got just better. We got back with L. kind of. Cem joined mountain adventures during the summer and L. visited, too. It was a fun, proper summer; minues the fires. Nice to run alone, also nice to run with friends.</p>

<p><img src="/assets/images/canmore/assiniboine24.png" alt="cem24" /></p>

<p>Spring started with King of the hills where you ran up to mountains for 4 weeks and it is a great wake-up call for the body. A summer full of running and exploration. Nothing serious, but still I got much more fit. Lost some weights and got better at running longer. This resulted in my very first ultra run in the inaugural Black Lung ultra in Nordegg. It was so a fun journey, Though I got lost during the last section; I was able to score third place. Funny thing was I didn’t even tried hard to do this. I just needed to run slower and walk sometimes and live in the mountains. And mountains shape you.</p>

<p><img src="/assets/images/canmore/assiniboine24.png" alt="utku24" /></p>

<p>Second year started with a better rental place. Cheaper, better, 3 floors and some privacy. I had few roommates to feel better when I travel during the dark months like November-January. I tried Ice Climbing and it was cold. Then L. arrived in February doing her sabbatical, kind of repeating what I did a year ago and discovering the winter wonderland. We started doing outings together and went to Assiniboine together this time. It was special; again.</p>

<p><img src="/assets/images/canmore/ski25.png" alt="ski25" /></p>

<p>We decided to stay for another summer. Summer around the corner and this time did more mountainering with our awesome guide Tim. Went to Bugaboos with L. 4 years later this time with gears and did some fun climbing. Things escalated we said and then we did Sir Donald. Being fit and doing long days; using your body and mind. So much fun.</p>

<p><img src="/assets/images/canmore/bugaboo25.png" alt="bugaboo25" /></p>

<p>We also did some more long runs, like Golden Ultra. Faster then ever and it was time to pack. I got my Canadian passport in early 2025 and the rest of the world was calling. Or it felt like that somehow. Maybe European Alps, maybe California. So, I started packing the home. The lease was ending conviniently at the end of September. We did one last run that I wanted to do since I moved in. Running to the Bow hut from Bow lake. Such a treat to run by the glacier late in the afternoon; almost no one in the trails. Just the sun, and the rock and the ice and the wind. And you.</p>

<p><img src="/assets/images/canmore/wapta25.png" alt="wapta25" /></p>

<p>So was the 2 years I spent in Canmore. Now I’m back to the valley for skiing and arranging the last bit of moving logistics. Destination California. Anti-Canada. Good weather; but you have Lake Tahoe! Will see how that will compare to the great Canadian wilderness. Regardless I’m impressed by the redwoods and the hills around the bay area. Should be a smoother ride back to the city.</p>

<p>Funny enough remote work was fine. I was I think most productive and impactful as I ever been. Did some useful things and continue growing and by the end of 2025 I was co-leading Gemini Nano pretraining. I did work a lot some times, but I also took the day off when there was a fresh snow or a course. I developed a nice cooking routine. Oats in the morning, salad for lunch and something fun for dinner. And weekends for some bread and olive oil, when not doing an early-start. It was a simple life and busy sometimes.</p>

<p>Canmore changed me. First of all, it tought me alignment. Align your life with the winds of change. Escpecially you, you who has options and means. Don’t try to run trails if you are in the city; similarly don’t live in the city if you wanna hit the trails. If you wanna work hard, go work hard in the main ship. If you are blocked at work, go do something else. Life is short but long enough to pipeline and time things. Start from priorities and shape your life so that those things are aligned and easy. Once there, look around and ask yourself what is aligned with my current situation. There will be more than you initially imagined, thats for sure.</p>

<p>Also I learned (again) that I don’t know what I don’t know. We are incapable of imagining life. Only the rough countours. And nothing compares to actually doing and living the dream. Maybe it is right, maybe it is a mistake. There is only one way to find out. So find it out.</p>

<p>Canmore made me more capable in the outdoors. Made me fit. Made me lonely and happy. Happy being alone. Happy doing my own thing, not much to coordinate. I have 3 skis now and bunch of gear. And all the right outfit for every weather. Very light tent, two of them to be precise and a climbing rope. So many different bags of all sizes. My favourite winter shoes are the NF traction mules now and no more buying running shoes without trying. Trail running shoes are the best.</p>

<p>So, in short, it felt great living here and indeed I got used to the views. Came for the views, stayed for the trails and became a mountain goat who can ski. Let see how this mountain goat will do in the Redwood city.</p>]]></content><author><name>Utku Evci</name><email>ue[224+1]@nyu.edu</email></author><category term="personal" /><category term="comments" /><category term="personal" /><summary type="html"><![CDATA[alignment is the new norm]]></summary></entry><entry><title type="html">The Science and Engineering of Large Language Models Workshop Slides</title><link href="https://evcu.github.io/ml/aims-lecture/" rel="alternate" type="text/html" title="The Science and Engineering of Large Language Models Workshop Slides" /><published>2025-05-25T00:00:00-04:00</published><updated>2025-05-25T00:00:00-04:00</updated><id>https://evcu.github.io/ml/aims-lecture</id><content type="html" xml:base="https://evcu.github.io/ml/aims-lecture/"><![CDATA[<p>It has been a while since I last gave a talk. I guess that’s what happens when your research and interest becomes hot and and you stay. I was invited to give a talk at the The Science and Engineering of Large Language Models Workshop in South Africa AIMS.</p>

<blockquote>
  <p>The African Institute for Mathematical Sciences (AIMS) is a pan-African network of Centres of Excellence for post-graduate training in mathematical sciences, research and public engagement in Science, Technology, Engineering and Mathematics (STEM). AIMS is enabling Africa’s talented students to become innovators driving the continent’s scientific, educational and economic self-sufficiency.</p>
</blockquote>

<p>I’m glad I’ve decided to join the workshop and travelled across the world to Cape Town. In between Gemini deadlines, I was able to distill years of learning and research curating a 2 hour class on compression and efficient fine-tuning topics. You don’t know how much you know until you start explaining things. It was a nice reminder that things that are trivial for you can be valuable learning for others.</p>

<p>So here are the slides I’ve compiled for the two lectures. Quite a few slides are taken-from (and attributed-to) the existing material online created by colleges, which enabled me to prepere for this talk between deadlines (thank you!).</p>

<iframe src="https://docs.google.com/presentation/d/e/2PACX-1vR4uGovaseHm9cKmsdMtsdR6MtVvT9qIc40ZZv1WWO1TNslvUrF94mOBiUVg7ISxanPwKEr_e7fqk_v/pubembed?start=false&amp;loop=false&amp;delayms=3000" frameborder="0" width="960" height="569" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>

<iframe src="https://docs.google.com/presentation/d/e/2PACX-1vSNveMguuXqnJ3RJbVE1BRcewVvtIL_Cgf8T_V9E30N4p5yijEzSt72KXGBYnGe7VLUJc5KpTeM7-8f/pubembed?start=false&amp;loop=false&amp;delayms=3000" frameborder="0" width="960" height="569" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>]]></content><author><name>Utku Evci</name><email>ue[224+1]@nyu.edu</email></author><category term="ml" /><category term="teaching" /><category term="ml" /><summary type="html"><![CDATA[compression and efficient finetuning slides]]></summary></entry><entry><title type="html">The unexpected covid year</title><link href="https://evcu.github.io/personal/covid-year/" rel="alternate" type="text/html" title="The unexpected covid year" /><published>2021-04-25T00:00:00-04:00</published><updated>2021-04-25T00:00:00-04:00</updated><id>https://evcu.github.io/personal/covid-year</id><content type="html" xml:base="https://evcu.github.io/personal/covid-year/"><![CDATA[<p><a href="https://www.youtube.com/watch?v=KvmCxxTG3fM" title="video"><img src="https://img.youtube.com/vi/KvmCxxTG3fM/0.jpg" alt="video" /></a></p>

<p>I will try to keep this one short. I guess I want to mark the time during my last night in Kas; a beautiful town by the Mediterranean sea in southern Turkey. The date is 25th April 2021. I left Canada and came to Turkey in August 2020, roughly 4.5 years after I left the homeland for <code class="language-plaintext highlighter-rouge">fame and fortune</code>. It was the the first covid summer and since I left Canada I did and learned bunch of new things which wouldn’t have happened if things were normal. The never ending summer after spending 5 months in isolation: worse year becoming one of the best. This is a note to the future. Not for work directly, but for life.</p>

<h2 id="the-covid-winter-is-over">The Covid Winter is Over</h2>
<ul>
  <li><em>FoMO or Indecisiveness creates stress.</em> It was funny, when I woke up one morning around March 2020, during the early lock-down days and I felt so happy. There was nothing to miss, and so I was happy being inside spending the time as I pleased. I spent bunch of time with plants, cooked and started learning new things like playing Trompet. These were the funny times when we thought the quarantine would be over soon and I was busy with converting full time. Now a year later things are still as bad as before if not worse.</li>
  <li><em>Breath and look for the next thing</em> I don’t know when it was; when I got the idea of applying for permenant residency (PR) in Ontario and leave/sublet my place in MTL. But I guess it was important to me and I wasn’t so overwhelmed with the Covid situation to think about such things. So I looked for IELTS exam and most of them were cancelled due to the Covid situation. Earliest exam was in Calgary, Alberta; so I flew there. People were surprised I was flying during the covid. Honestly given that everyone had their masks all the time; it was probably safer than many other places in the city. Bars were open in Alberta, so I enjoyed my first outdoor beer after the long winter and drove to Banff and Jasper. Best time to visit the mountains and few tourists to be found. I think many people still believe flying is very risky in terms of Covid. Is that the case? or I guess it is ‘safe’ to believe so.</li>
  <li><em>Maybe keep your ties light</em> I didn’t have family in Montreal, or a house that I bought or expensive furnitures. So it was relatively easy to leave. Honestly, I had a nice setup in Montreal and I could have stayed there and things would have been fine and a year would end in a blink. I find it hard to imagine how that would be; feels boring. Would I gave up staying and leave after a while anyway? Maybe this was my destiny. This is certain: what I did was possible because I didn’t have certain ties with Montreal, yet. And this is good and bad news at the same time: I picked the good. It would have been sad to live a life dreaming an alternative so badly without seeing the beauty and the good and the opportunity in it. Maybe it is possible to do both; having ties and thinking twice before having them so that you are ready to drop them whenever needed.</li>
</ul>

<h2 id="the-alternative-covid-year">The Alternative Covid Year</h2>
<ul>
  <li><em>Miss and enjoy</em> So mid-august I took off and arrived Istanbul with many plans. My return ticket was for November I think; which then got postponed to May. Istanbul have many cats and I was innocently smiling at them when running and preparing for the hike up Mt. Ararat. Climbing the highest point in Turkey was something I planned for a while and I executed it with my brother. Such a nice trip it was and an easy one in hindsight to do/organize. Here, in Turkey, I know the people, possibilities and limits. I missed it so badly, the food, the land, the sea, the language, dear friends and family. Those fish, too. Fish on grill. Again, nothing too wrong about being far for a while. You miss and it makes you happy and smile when you are back. Maybe you have to leave some times to miss again.</li>
  <li><em>Dreams I accumulated and consumed</em> Doing and arranging things were much easier at home. Maybe this is climate; Mediterranean. I did many things that I dreamed for a while. Learning riding horses, scuba diving, swimming almost everyday in the sea, riding my motorbike daily, learning sailing, climbing highlands. I did all of those things in 6 months maybe. Crazy when I think about it. So many years I hustled in America and it feels like I didn’t do half of what I did this year. Now I know I can do more in life as long as I keep accumulating dreams and willing to sail towards the unknown.</li>
  <li><em>Discovering places</em> I still have a home in Turkey and after this year I know the land better. I know Cunda and Ayvalik better and its sad history. For example, which small batch olive oil producer I prefer in Ayvalik. I know oil is pressed around November and how it tastes right after harvest. I am a sailor. I belong to the sea as much as I belong to the mountains. Sad that I discovered the underwater world in my late 20s. I know the wonderful Bonjuk Bay community; a place to come back regularly. I know various fish types now: Akya, Levrek, Cupra, Lufer, Kalkan, Barbun, Kolyoz, Uskumru, Tekir. I know I can carry a 20kg back-pack upto Kilimanjaro. And the legend: Land Crusier. Knowing the animal kingdom on the land and in the sea better. Not scared of the sea anymore, can spend hours in it. I know I can dance for 12 hours non-stop and I desire that. I know the lemon tree has thorns and the yellow lemons are up on trees from Jan to May. Malta Plum is ready in April and figs are due October. Chestnuts are ready in December. 1000 meters high hills are 10min away from Kas and there are so may ancient cities around. I know why Kayakoy is special and where to stay there when I am back. I know there are 2 summers for me in Turkey one in October/November and the other in March/April. O and there is Tanzania.</li>
  <li><em>When everyone is staying maybe it is time to go</em> Places are beautiful and unique and we, people, ruin them most of the time. It was so special to be in Tanzania during the covid. So special to be one of the first to get into the Ngorongoro crater in early morning and so special to climb to Kilimanjaro seeing maybe 5 other people in a day. Being able to reserve special camp sites in Serengeti at the gate. And working from Zanzibar Utupoa. Crazy times. I wished I had more time to discover during the Covid year. I skipped Egypt and many others possibly. I worked instead. Maybe I shouldn’t have done that. I should remember this. I need to stop sometimes and take the opportunity. I don’t know when the next time would be; I don’t know how long this life would last either.</li>
  <li><em>And the cycle repeats?</em> I think I got used to this. beautiful sea, the beautiful sun and beautiful mountains. Delicious fish and fresh tomato. I got used to being in Kas but yet it feels difficult to leave. So many memories and only few of them are recorded. More to be missed and forgotten.</li>
</ul>

<p>I got my permanent residency in Canada few weeks ago. The very thing that I pushed during the first summer of Covid and now It is time to go back in order to claim it. 3*365 days needs to be spent in Canada to become a citizen; which would make me a first world citizen; less borders and more opportunities. O, yes, the life is unjust and some of us needs to do things like that I guess to prove things. But only we can do sth about it, no-one would really care. I don’t know whether I would fulfill this requirement. Maybe I would go Japan, or Africa or Netherlands. Australia would be interesting, too. I wonder how bad I will miss the Covid year when I am back in Canada. I am sure summer will help; but winter is ahead and the cycle is being cycle.</p>

<p>Covid was unexpected and made me realize a new path, a new circle. Now I know better and excited to discover more.</p>]]></content><author><name>Utku Evci</name><email>ue[224+1]@nyu.edu</email></author><category term="personal" /><category term="comments" /><category term="personal" /><summary type="html"><![CDATA[and what I learned during the disruption]]></summary></entry><entry><title type="html">Game of Numbers: RigL at ICLR2020</title><link href="https://evcu.github.io/ml/rigl-iclr/" rel="alternate" type="text/html" title="Game of Numbers: RigL at ICLR2020" /><published>2020-07-15T00:00:00-04:00</published><updated>2020-07-15T00:00:00-04:00</updated><id>https://evcu.github.io/ml/rigl-iclr</id><content type="html" xml:base="https://evcu.github.io/ml/rigl-iclr/"><![CDATA[<p>6 months ago our paper got rejected at ICLR-2020 like many other papers did. At the time I thought of writing about our experience on twitter and then stopped myself from doing that since I didn’t want to be the person who talks about <code class="language-plaintext highlighter-rouge">the bad</code> when it happens to themselves as if they are the only one who matters. 2 months later (March 2020), it still felt important to share this experience and I wrote this post, however decided not making online as we were afraid it might affect the reviews for the ICML re-submission. I can see the tone of the writing
reflects my disappointment I had at the time. Hope it doesn’t increase your stress levels.</p>

<p>We will present our work at ICML 2020 virtual conference tomorrow: my first first-author conference paper. I am happy… but I was not 3 months ago when ICLR results were out. I would like to share why: RigL @ ICLR.</p>

<p>Our paper is called <a href="https://arxiv.org/abs/1911.11134">Rigging the Lottery: Making all Tickets Winners</a> or in short <em>RigL</em>. Erich came up with the title and sometimes when you have the title, it is clear what you need to do. We took on the problem of training sparse neural networks that was observed by many and highlighted by Jonathan in <a href="https://arxiv.org/abs/1803.03635">Lottery Ticket Hypothesis</a>. Building on top of the recent work on dynamic sparse training (<a href="https://www.nature.com/articles/s41467-018-04316-3">SET</a>, <a href="https://arxiv.org/abs/1907.04840">SNFS</a>, <a href="https://arxiv.org/abs/1902.05967">DSR</a>), we showed that it is possible to train sparse neural networks from scratch without the dense parameterization and match or exceed the accuracy of dense-to-sparse methods like pruning. Great experimental results, relatively simple algorithm and fully open-sourced code. We got: 3, 6, 6: <strong>Borderline</strong>.</p>

<p>Yes, reviews are a bit random; but I would like to say a bit more than that. It is a great thing that all reviews are <a href="https://openreview.net/forum?id=ryg7vA4tPB">open</a> and I encourage you to read them all if you have some spare time. In short, we addressed the concerns of reviewers 2 and 3 after the initial review (for which we were given the scores 6 (weak accept)); however they never responded/reacted to our responses and kept their scores as is (this was unfortunate since you need reviewers who want your paper in). We had a long discussion with reviewer 1 which resulted in brand new experiments with 2 new datasets. Later it became clear, that the main reason for the low score given was:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>The idea of starting from a small, sparse network and expanding it is not novel. DEN [Yoon et al. 18] proposed the same bottom-up approach with sparsely initialized networks, while they allowed to increase the number of neurons at each layer and focused more on continual learning. The authors should compare the two methods both conceptually and experimentally.
</code></pre></div></div>

<p>I was puzzled when I read the DEN paper and I am still puzzled. DEN is a method for <a href="https://arxiv.org/abs/1802.07569">continual learning</a>. It uses sparsity to pick a subset of the existing network when a new task is given and the algorithm adds new neurons (i.e. increased #parameters). Continual learning assumes that the training data will arrive at different times and we should learn from the new data/task when available, whilst not forgetting what we already learned. The paper also claims <code class="language-plaintext highlighter-rouge">using continual learning</code> they achieve better results at classifying images than the common batch supervised setting. In other words, the computer vision field was doing it all wrong training on all classes at once and not using their method. This claim is big and I couldn’t verify their results since the code provided by the authors only had MNIST training, where we couldn’t see any improvement using DEN over regular multi-class training.</p>

<p><em>RigL</em> is a method for training <strong>sparse networks</strong> and the size of the network is constant throughout the training. We responded to the review highlighting that our method <code class="language-plaintext highlighter-rouge">does not grow neurons</code> and it is not a continual learning algorithm (which should be obvious), but we got the following response.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>I believe that you could compare against DEN-Finetune with multilayer perceptron as the base network, using the codes provided at the git repository you linked. Timestamped inference should not be an issue with the final finetuned DEN (DEN-Finetune) at all.

Since I do not find the idea of starting with a sparse network and growing it up in a bottom-up manner as novel, as it is already done in DEN, without experimental comparison against it I do not believe that the paper has a sufficient novelty or advantage over it.

Thus I will stick to my original rating of weak reject.
</code></pre></div></div>

<p>Yes, again the reviewer implied that our method RigL grows networks and indicated that they will be keeping their score. Well, either we failed to convey our message or it was a lost cause from the beginning: I don’t know. What do you do when you get a message like this half-way through rebuttal? When the destiny of your paper depends on doing a comparison that seems (and probably is) irrelevant. With hindsight, I wished we had stopped at this point but as you may guess, we spent the day and the night and did the comparison on which the fate of our paper is tied(?).</p>

<p>Code available for DEN was only for the MNIST fully-connected network, which is by all means irrelevant to contemporary computer vision and to our goal of training modern networks (ResNet, MobileNet) on large datasets (ImageNet-2012). I ran the experiments and crafted a response with my collaborators. Our results showed that:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(A) DEN does not achieve meaningful sparsity -- the networks obtained are only 10% sparse -- far too low to be of any practical benefit.

(B) Using DEN as a pre-training step and then fine tuning the resulting network is not as efficient as training it from scratch. This confirms results of Liu et.al.(https://arxiv.org/abs/1810.05270).

(C) RigL requires ~100x fewer FLOPs than DEN and gets higher accuracy than DEN-Finetune.
</code></pre></div></div>

<p>So, the network DEN trains is only 10% sparse (only 10% missing), does not improve the performance over fine-tuning and doesn’t come anywhere near <em>RigL</em> in terms of efficiency, which is kind of expected since continual learning is a more difficult setting than regular multi-class classification. It’s possible to get different results using non-public hyper-parameter settings and I would be more than happy to go back and update my statement.</p>

<p>We didn’t hear from reviewer-1 after our response. Such a waste of time on our end. We hoped the Area Chair (AC) would note this absurd comparison request and (possibly) the unfair rating. The decision was reject (possibly the decision of AC got overturned at a higher level) and the message from program chairs was:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>A somewhat new approach to growing sparse networks. Experimental validation is good, focusing on ImageNet and CIFAR-10, plus experiments on language modeling. Though efficient in computation and storage size, the approach does not have a theoretical foundation. That does not agree with the intended scope of ICLR. I strongly suggest the authors submit elsewhere.
</code></pre></div></div>

<p>which is I believe a very unfortunate thing to say, given that the majority of work published in ICLR or in deep learning doesn’t  have any theoretical foundations and obviously this is not a requirement.</p>

<p>So why did we get this response and result? I think it is all a game of numbers when stakes are high and time is limited. 6-6-3 are not great scores and I am not happy about that. We can do better at motivating our method, and do a better job at rebuttal. We should be more uplifting and less irritated in our responses even when the review we get doesn’t make sense to us. However, this doesn’t justify the disappointing experience we had.</p>

<p>Why did reviewer-1 insist on a comparison with DEN? Is it possible that the reviewer is related to the paper? I don’t know and I don’t want to speculate. If yes, this is very bad ethics. If not, this is very bad quality. Honestly, I liked the idea of the DEN paper and its algorithm. I would have cited and included a comparison if I was writing a paper on continual learning. One shouldn’t condition a good rating on a comparison that is provably unrelated.</p>

<p>I am pretty sure RigL is not the only paper which had an experience like this. Problems with bad reviewing practices are pointed out by others, too(<a href="https://approximatelycorrect.com/2018/07/10/troubling-trends-in-machine-learning-scholarship/#more-770">link1</a>,<a href="https://medium.com/syncedreview/cvpr-paper-controversy-ml-community-reviews-peer-review-79bf49eb0547">link2</a>). I understand that the reviewing process is inherently noisy: reviewers have limited time and have different criteria. Therefore you get many reviewers and the results should be less noisy, right?</p>

<p>No, this is not true if qualified reviewers are in the minority. I keep hearing from the researchers I interact with that the overall review quality has decreased over the years, possibly due to the increased number of paper submissions caused by the inflated value of publishing in top conferences. I think this is very important to understand. <em>There is a problem and the situation is getting worse</em>. What if the bad quality reviews are like a virus that spreads to other researchers when there is no incentive to stop its spread?</p>

<p>This might not seem important to many people, and even inevitable to some. I think it is a very good sign of the quality of research we are doing. And quality is important. We are all here for a short time and if the goal is to push the limits of human knowledge we should be careful how we spend our time doing research. I don’t think I spent my time well during the rebuttal and this is partly due to review quality: missing most of our team offsite doing rebuttal, not preparing for an important on-site interview which I failed and wasting my time and energy hoping (maybe naively) that our score would increase if I can complete the comparison on time. I believe all of us have had similar experiences in our research career and I don’t think it needs to be this way. Here’s my 2 Kuruş (means <code class="language-plaintext highlighter-rouge">cent</code> in turkish):</p>

<p>(1) We should incentivize good reviews; rewarding it as we reward publishing papers.</p>

<p>(2) We should have clear guidelines for promoting individual research doing reviews and consequences if not followed.</p>

<p>Finishing the post, I would like to invite you to share your story and I would be more than happy to reference them. Every story is important and sharing them is probably the first step towards a solution.</p>

<p><em>I would like to thank Erich, Pablo, Laura, Erin and Linda for their feedback.</em></p>]]></content><author><name>Utku Evci</name><email>ue[224+1]@nyu.edu</email></author><category term="ml" /><category term="comments" /><category term="ml" /><category term="personal" /><summary type="html"><![CDATA[When numbers are bad, rejection is likely; despite reason.]]></summary></entry><entry><title type="html">Implementing Sparse Networks with MicroGrad</title><link href="https://evcu.github.io/ml/sparse-micrograd/" rel="alternate" type="text/html" title="Implementing Sparse Networks with MicroGrad" /><published>2020-05-10T00:00:00-04:00</published><updated>2020-05-10T00:00:00-04:00</updated><id>https://evcu.github.io/ml/sparse-micrograd</id><content type="html" xml:base="https://evcu.github.io/ml/sparse-micrograd/"><![CDATA[<p><a href="https://colab.research.google.com/github/evcu/micrograd/blob/sparse/sparse-demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" /></a></p>

<p>O lala, time passes fast under quarantine. It has been almost 3 weeks since Andrej Karpathy shared his super light-weight autograd library and me getting exciting about it. Roughly 2 years ago I had a similar mini-project and wrote about it <a href="https://evcu.github.io/ml/autograd/">here</a>. After seeing Micrograd and its simplicity I decided to spend some time on it.</p>

<p>Andrej’s implementation works on pure python and the speed is not a concern. I thought maybe we can accelerate it using sparsity :P. Just kidding…</p>

<p>I realized how easy it would be to implement sparse networks if the building blocks are neurons. It literally took me few hours to implement sparse networks and <strong>RigL</strong> algorithm. I think this is a great demonstration of the power of changing abstractions. The abstractions and tools we have influences our work/research greatly and maybe what we need is a paradigm shift to enable the next big jump in AI. I would vote for neurons as the future building blocks of Neural Networks. But arguing about this is not the goal of this notebook.</p>

<p><img src="/assets/images/micrograd_sparse/output_19_0.png" alt="png" /></p>

<h3 id="plan">Plan</h3>
<ul>
  <li>Checkout Andrej’s implementation of <code class="language-plaintext highlighter-rouge">Value</code> <a href="https://github.com/evcu/micrograd/blob/sparse/micrograd/engine.py">here</a>. This is the building block of back-propagation algorithm and it is the simplest autograd engine I’ve seen.</li>
  <li>Checkout the <a href="https://github.com/evcu/micrograd/blob/sparse/demo.ipynb">demo</a>. At the end of this notebook, we would repeat the same experiment using sparse networks.</li>
  <li>We will implement Sparse network (SparseMLP) which uses the SparseLayer which uses the SparseNeuron.</li>
  <li>Finally we will train our network on a binary classification task. We will observe the failure of regular sparse training and we see how RigL can be used to improve performance considerably.</li>
</ul>

<h3 id="defining-sparse-network">Defining Sparse Network</h3>
<p>Neurons in a layer usually connects to the every neuron in the previous layer. This is because how we define them. We call such networks <strong>dense</strong> (although I would argue they are sparse, too since they are not connecting bunch of other neurons in other layers.).</p>

<p>On the other hand sparse neurons connect to only a fraction of the neurons in the previous layers and the sparsity of a neuron can be defined as the fraction of neurons it is not connected. So number of connections in a sparse neuron would be defined by,</p>

\[\#connections = \lfloor(1 - sparsity) * \#input\_neurons\rfloor\]

<p>Below, we share the implementation of <code class="language-plaintext highlighter-rouge">SparseNeuron</code>. It calculates number of connections as defined above and holds the weights in a dictionary <code class="language-plaintext highlighter-rouge">self.w</code> using the index of the input neurons as keys.</p>

<p>RigL algorithm, sometimes need to calculate the gradient of the non-existing connections. In order to do that we define <code class="language-plaintext highlighter-rouge">self.zero_ws</code> which are populated if the <code class="language-plaintext highlighter-rouge">dense_grad=True</code>.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">SparseNeuron</span><span class="p">(</span><span class="n">Module</span><span class="p">):</span>

    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">nin</span><span class="p">,</span> <span class="n">sparsity</span><span class="p">,</span> <span class="n">nonlin</span><span class="o">=</span><span class="bp">True</span><span class="p">):</span>
        <span class="k">assert</span> <span class="mi">0</span> <span class="o">&lt;=</span> <span class="n">sparsity</span> <span class="o">&lt;</span> <span class="mi">1</span>
        <span class="n">n_weights</span> <span class="o">=</span> <span class="n">math</span><span class="p">.</span><span class="n">ceil</span><span class="p">((</span><span class="mi">1</span> <span class="o">-</span> <span class="n">sparsity</span><span class="p">)</span> <span class="o">*</span> <span class="n">nin</span><span class="p">)</span>
        <span class="n">w_indices</span> <span class="o">=</span> <span class="n">random</span><span class="p">.</span><span class="n">sample</span><span class="p">(</span><span class="nb">range</span><span class="p">(</span><span class="n">n_weights</span><span class="p">),</span> <span class="n">k</span><span class="o">=</span><span class="n">n_weights</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">w</span> <span class="o">=</span> <span class="p">{</span><span class="n">i</span><span class="p">:</span> <span class="n">Value</span><span class="p">(</span><span class="n">random</span><span class="p">.</span><span class="n">uniform</span><span class="p">(</span><span class="o">-</span><span class="mf">0.1</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">))</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">w_indices</span><span class="p">}</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">b</span> <span class="o">=</span> <span class="n">Value</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">nonlin</span> <span class="o">=</span> <span class="n">nonlin</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">zero_ws</span> <span class="o">=</span> <span class="p">{}</span>

    <span class="k">def</span> <span class="nf">__call__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">dense_grad</span><span class="o">=</span><span class="bp">False</span><span class="p">):</span>
        <span class="k">if</span> <span class="n">dense_grad</span><span class="p">:</span>
            <span class="c1"># We need to calculate all gradients therefore introduce zeros.
</span>            <span class="bp">self</span><span class="p">.</span><span class="n">zero_ws</span> <span class="o">=</span> <span class="p">{}</span>
            <span class="n">results</span> <span class="o">=</span> <span class="p">[]</span>
            <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">xi</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
                <span class="k">if</span> <span class="n">i</span> <span class="ow">in</span> <span class="bp">self</span><span class="p">.</span><span class="n">w</span><span class="p">:</span>
                    <span class="n">results</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">w</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">*</span><span class="n">xi</span><span class="p">)</span>
                <span class="k">else</span><span class="p">:</span>
                    <span class="bp">self</span><span class="p">.</span><span class="n">zero_ws</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">Value</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
                    <span class="n">results</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">zero_ws</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">*</span><span class="n">xi</span><span class="p">)</span>

            <span class="n">act</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">results</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">b</span><span class="p">)</span>
            <span class="k">return</span> <span class="n">act</span><span class="p">.</span><span class="n">relu</span><span class="p">()</span> <span class="k">if</span> <span class="bp">self</span><span class="p">.</span><span class="n">nonlin</span> <span class="k">else</span> <span class="n">act</span>
        <span class="k">else</span><span class="p">:</span>
            <span class="n">act</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">((</span><span class="n">wi</span><span class="o">*</span><span class="n">x</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">wi</span> <span class="ow">in</span> <span class="bp">self</span><span class="p">.</span><span class="n">w</span><span class="p">.</span><span class="n">items</span><span class="p">()),</span> <span class="bp">self</span><span class="p">.</span><span class="n">b</span><span class="p">)</span>
            <span class="k">return</span> <span class="n">act</span><span class="p">.</span><span class="n">relu</span><span class="p">()</span> <span class="k">if</span> <span class="bp">self</span><span class="p">.</span><span class="n">nonlin</span> <span class="k">else</span> <span class="n">act</span>        

    <span class="k">def</span> <span class="nf">parameters</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="k">return</span> <span class="nb">list</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">w</span><span class="p">.</span><span class="n">values</span><span class="p">())</span> <span class="o">+</span> <span class="p">[</span><span class="bp">self</span><span class="p">.</span><span class="n">b</span><span class="p">]</span>

    <span class="k">def</span> <span class="nf">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="k">return</span> <span class="sa">f</span><span class="s">"</span><span class="si">{</span><span class="s">'ReLU'</span> <span class="k">if</span> <span class="bp">self</span><span class="p">.</span><span class="n">nonlin</span> <span class="k">else</span> <span class="s">'Linear'</span><span class="si">}</span><span class="s">Neuron(</span><span class="si">{</span><span class="nb">len</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">w</span><span class="p">)</span><span class="si">}</span><span class="s">)"</span>
</code></pre></div></div>

<h3 id="defining-larger-building-blocks-and-loss">Defining Larger Building Blocks and Loss</h3>
<p>Next we define <code class="language-plaintext highlighter-rouge">SparseLayer</code>, <code class="language-plaintext highlighter-rouge">SparseMLP</code> and the <code class="language-plaintext highlighter-rouge">loss</code> function. Nothing special here except the regular neurons are replaced by sparse neurons. <code class="language-plaintext highlighter-rouge">SparseMLP</code> expects a list of sparsities in addition to the hidden layer sizes.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">SparseLayer</span><span class="p">(</span><span class="n">Module</span><span class="p">):</span>

    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">nin</span><span class="p">,</span> <span class="n">nout</span><span class="p">,</span> <span class="n">sparsity</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">neurons</span> <span class="o">=</span> <span class="p">[</span><span class="n">SparseNeuron</span><span class="p">(</span><span class="n">nin</span><span class="p">,</span> <span class="n">sparsity</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">nout</span><span class="p">)]</span>

    <span class="k">def</span> <span class="nf">__call__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">dense_grad</span><span class="o">=</span><span class="bp">False</span><span class="p">):</span>
        <span class="n">out</span> <span class="o">=</span> <span class="p">[</span><span class="n">n</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">dense_grad</span><span class="o">=</span><span class="n">dense_grad</span><span class="p">)</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="bp">self</span><span class="p">.</span><span class="n">neurons</span><span class="p">]</span>
        <span class="k">return</span> <span class="n">out</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">out</span><span class="p">)</span> <span class="o">==</span> <span class="mi">1</span> <span class="k">else</span> <span class="n">out</span>

    <span class="k">def</span> <span class="nf">parameters</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="k">return</span> <span class="p">[</span><span class="n">p</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="bp">self</span><span class="p">.</span><span class="n">neurons</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">n</span><span class="p">.</span><span class="n">parameters</span><span class="p">()]</span>

    <span class="k">def</span> <span class="nf">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="k">return</span> <span class="sa">f</span><span class="s">"Layer of [</span><span class="si">{</span><span class="s">', '</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="bp">self</span><span class="p">.</span><span class="n">neurons</span><span class="p">)</span><span class="si">}</span><span class="s">]"</span>

<span class="k">class</span> <span class="nc">SparseMLP</span><span class="p">(</span><span class="n">Module</span><span class="p">):</span>

    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">nin</span><span class="p">,</span> <span class="n">nouts</span><span class="p">,</span> <span class="n">sparsities</span><span class="p">):</span>
        <span class="n">sz</span> <span class="o">=</span> <span class="p">[</span><span class="n">nin</span><span class="p">]</span> <span class="o">+</span> <span class="n">nouts</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">layers</span> <span class="o">=</span> <span class="p">[</span><span class="n">SparseLayer</span><span class="p">(</span><span class="n">sz</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">sz</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">],</span> <span class="n">sparsity</span><span class="o">=</span><span class="n">sparsities</span><span class="p">[</span><span class="n">i</span><span class="p">],</span>
                                   <span class="n">nonlin</span><span class="o">=</span><span class="n">i</span><span class="o">!=</span><span class="nb">len</span><span class="p">(</span><span class="n">nouts</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">nouts</span><span class="p">))]</span>

    <span class="k">def</span> <span class="nf">__call__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">dense_grad</span><span class="o">=</span><span class="bp">False</span><span class="p">):</span>
        <span class="k">for</span> <span class="n">layer</span> <span class="ow">in</span> <span class="bp">self</span><span class="p">.</span><span class="n">layers</span><span class="p">:</span>
            <span class="n">x</span> <span class="o">=</span> <span class="n">layer</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">dense_grad</span><span class="o">=</span><span class="n">dense_grad</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">x</span>

    <span class="k">def</span> <span class="nf">parameters</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="k">return</span> <span class="p">[</span><span class="n">p</span> <span class="k">for</span> <span class="n">layer</span> <span class="ow">in</span> <span class="bp">self</span><span class="p">.</span><span class="n">layers</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">layer</span><span class="p">.</span><span class="n">parameters</span><span class="p">()]</span>

    <span class="k">def</span> <span class="nf">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="n">main_str</span> <span class="o">=</span> <span class="s">'</span><span class="se">\n</span><span class="s">'</span><span class="p">.</span><span class="n">join</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">layer</span><span class="p">)</span> <span class="k">for</span> <span class="n">layer</span> <span class="ow">in</span> <span class="bp">self</span><span class="p">.</span><span class="n">layers</span><span class="p">)</span>
        <span class="k">return</span> <span class="sa">f</span><span class="s">"MLP of [</span><span class="se">\n</span><span class="si">{</span><span class="n">main_str</span><span class="si">}</span><span class="se">\n</span><span class="s">]"</span>

<span class="c1"># loss function
</span><span class="k">def</span> <span class="nf">loss</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="bp">None</span><span class="p">,</span> <span class="n">dense_grad</span><span class="o">=</span><span class="bp">False</span><span class="p">):</span>

    <span class="c1"># inline DataLoader :)
</span>    <span class="k">if</span> <span class="n">batch_size</span> <span class="ow">is</span> <span class="bp">None</span><span class="p">:</span>
        <span class="n">Xb</span><span class="p">,</span> <span class="n">yb</span> <span class="o">=</span> <span class="n">X</span><span class="p">,</span> <span class="n">y</span>
    <span class="k">else</span><span class="p">:</span>
        <span class="n">ri</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">permutation</span><span class="p">(</span><span class="n">X</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">])[:</span><span class="n">batch_size</span><span class="p">]</span>
        <span class="n">Xb</span><span class="p">,</span> <span class="n">yb</span> <span class="o">=</span> <span class="n">X</span><span class="p">[</span><span class="n">ri</span><span class="p">],</span> <span class="n">y</span><span class="p">[</span><span class="n">ri</span><span class="p">]</span>
    <span class="n">inputs</span> <span class="o">=</span> <span class="p">[</span><span class="nb">list</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="n">Value</span><span class="p">,</span> <span class="n">xrow</span><span class="p">))</span> <span class="k">for</span> <span class="n">xrow</span> <span class="ow">in</span> <span class="n">Xb</span><span class="p">]</span>

    <span class="c1"># forward the model to get scores
</span>    <span class="n">scores</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="n">partial</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">dense_grad</span><span class="o">=</span><span class="n">dense_grad</span><span class="p">),</span> <span class="n">inputs</span><span class="p">))</span>

    <span class="c1"># svm "max-margin" loss
</span>    <span class="n">losses</span> <span class="o">=</span> <span class="p">[(</span><span class="mi">1</span> <span class="o">+</span> <span class="o">-</span><span class="n">yi</span><span class="o">*</span><span class="n">scorei</span><span class="p">).</span><span class="n">relu</span><span class="p">()</span> <span class="k">for</span> <span class="n">yi</span><span class="p">,</span> <span class="n">scorei</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">yb</span><span class="p">,</span> <span class="n">scores</span><span class="p">)]</span>
    <span class="n">data_loss</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">losses</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="mf">1.0</span> <span class="o">/</span> <span class="nb">len</span><span class="p">(</span><span class="n">losses</span><span class="p">))</span>
    <span class="c1"># L2 regularization
</span>    <span class="n">alpha</span> <span class="o">=</span> <span class="mf">1e-4</span>
    <span class="n">reg_loss</span> <span class="o">=</span> <span class="n">alpha</span> <span class="o">*</span> <span class="nb">sum</span><span class="p">((</span><span class="n">p</span><span class="o">*</span><span class="n">p</span> <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">model</span><span class="p">.</span><span class="n">parameters</span><span class="p">()))</span>
    <span class="n">total_loss</span> <span class="o">=</span> <span class="n">data_loss</span> <span class="o">+</span> <span class="n">reg_loss</span>

    <span class="c1"># also get accuracy
</span>    <span class="n">accuracy</span> <span class="o">=</span> <span class="p">[(</span><span class="n">yi</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">)</span> <span class="o">==</span> <span class="p">(</span><span class="n">scorei</span><span class="p">.</span><span class="n">data</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">)</span> <span class="k">for</span> <span class="n">yi</span><span class="p">,</span> <span class="n">scorei</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">yb</span><span class="p">,</span> <span class="n">scores</span><span class="p">)]</span>
    <span class="k">return</span> <span class="n">total_loss</span><span class="p">,</span> <span class="nb">sum</span><span class="p">(</span><span class="n">accuracy</span><span class="p">)</span> <span class="o">/</span> <span class="nb">len</span><span class="p">(</span><span class="n">accuracy</span><span class="p">)</span>
</code></pre></div></div>

<h1 id="our-sparse-network">Our Sparse Network</h1>
<p>Here we create a sparse network. Note that there is only 2 input dimensions and we want to use them both. Therefore we set the sparsity to 0 for the first layer. Last layer has a single output neuron, therefore we set its sparsity to a lower value than the middle layer.</p>

<p>These sparsities are selected arbitrarily, feel free to play with them.</p>

<p>Let’s create this model and train it with SGD.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">1337</span><span class="p">)</span>
<span class="n">random</span><span class="p">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">1337</span><span class="p">)</span>
<span class="n">static_sparse_model</span> <span class="o">=</span> <span class="n">SparseMLP</span><span class="p">(</span><span class="n">nin</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">nouts</span><span class="o">=</span><span class="p">[</span><span class="mi">16</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="n">sparsities</span><span class="o">=</span><span class="p">[</span><span class="mf">0.</span><span class="p">,</span><span class="mf">0.9</span><span class="p">,</span><span class="mf">0.8</span><span class="p">])</span> <span class="c1"># 2-layer neural network
</span><span class="k">print</span><span class="p">(</span><span class="n">static_sparse_model</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">"number of parameters"</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">static_sparse_model</span><span class="p">.</span><span class="n">parameters</span><span class="p">()))</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>MLP of [
Layer of [ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2)]
Layer of [ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2)]
Layer of [LinearNeuron(4)]
]
number of parameters 101
</code></pre></div></div>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">TRAIN_STEPS</span> <span class="o">=</span> <span class="mi">400</span>
<span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">TRAIN_STEPS</span><span class="p">):</span>

    <span class="c1"># forward
</span>    <span class="n">total_loss</span><span class="p">,</span> <span class="n">acc</span> <span class="o">=</span> <span class="n">loss</span><span class="p">(</span><span class="n">static_sparse_model</span><span class="p">)</span>

    <span class="c1"># backward
</span>    <span class="n">static_sparse_model</span><span class="p">.</span><span class="n">zero_grad</span><span class="p">()</span>
    <span class="n">total_loss</span><span class="p">.</span><span class="n">backward</span><span class="p">()</span>

    <span class="c1"># update (sgd)
</span>    <span class="n">learning_rate</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">-</span> <span class="mf">0.9</span><span class="o">*</span><span class="n">k</span><span class="o">/</span><span class="n">TRAIN_STEPS</span>
    <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">static_sparse_model</span><span class="p">.</span><span class="n">parameters</span><span class="p">():</span>
        <span class="n">p</span><span class="p">.</span><span class="n">data</span> <span class="o">-=</span> <span class="n">learning_rate</span> <span class="o">*</span> <span class="n">p</span><span class="p">.</span><span class="n">grad</span>

    <span class="k">if</span> <span class="n">k</span> <span class="o">%</span> <span class="mi">20</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
        <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"step </span><span class="si">{</span><span class="n">k</span><span class="si">}</span><span class="s"> loss </span><span class="si">{</span><span class="n">total_loss</span><span class="p">.</span><span class="n">data</span><span class="si">}</span><span class="s">, accuracy </span><span class="si">{</span><span class="n">acc</span><span class="o">*</span><span class="mi">100</span><span class="si">}</span><span class="s">%"</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"step </span><span class="si">{</span><span class="n">k</span><span class="si">}</span><span class="s"> loss </span><span class="si">{</span><span class="n">total_loss</span><span class="p">.</span><span class="n">data</span><span class="si">}</span><span class="s">, accuracy </span><span class="si">{</span><span class="n">acc</span><span class="o">*</span><span class="mi">100</span><span class="si">}</span><span class="s">%"</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>step 0 loss 0.9999118995994747, accuracy 50.0%
step 20 loss 0.9501034046836242, accuracy 50.0%
step 40 loss 0.3029776283757716, accuracy 87.0%
step 60 loss 0.297169433748757, accuracy 87.0%
step 80 loss 0.29289542954897096, accuracy 88.0%
step 100 loss 0.2931921490672611, accuracy 87.0%
step 120 loss 0.2989848931028508, accuracy 88.0%
step 140 loss 0.2911950144271784, accuracy 87.0%
step 160 loss 0.29066315530929493, accuracy 88.0%
step 180 loss 0.2899413473514984, accuracy 88.0%
step 200 loss 0.29011181298721433, accuracy 88.0%
step 220 loss 0.2897383548597768, accuracy 88.0%
step 240 loss 0.289301900462062, accuracy 88.0%
step 260 loss 0.2886158754721536, accuracy 87.0%
step 280 loss 0.2881950427716646, accuracy 88.0%
step 300 loss 0.28796501931513896, accuracy 87.0%
step 320 loss 0.28737672236751133, accuracy 88.0%
step 340 loss 0.28752147018854757, accuracy 88.0%
step 360 loss 0.28687039533784287, accuracy 88.0%
step 380 loss 0.28676645061703504, accuracy 88.0%
step 399 loss 0.28654023971513226, accuracy 88.0%
</code></pre></div></div>

<h1 id="difficulty-of-training-sparse-networks">Difficulty of Training Sparse Networks.</h1>
<p>Sparse training is difficult. This has been reported by many, including the <a href="https://arxiv.org/abs/1803.03635">Lottery Ticket Hypothesis</a>. <a href="https://arxiv.org/abs/1911.11134">RigL</a> is one of the recent dynamic training algorithms (others include (<a href="https://arxiv.org/abs/1907.04840">SNFS</a>, <a href="https://www.nature.com/articles/s41467-018-04316-3">SET</a>, <a href="https://arxiv.org/abs/1902.05967">DSR</a>). It changes how the neurons are wired during the training dynamically and by doing so improves the training of sparse networks.</p>

<p>Here is a summary of how RigL updates connections:</p>

<p>1) Calculate dense gradient, which enables us to obtain the gradient of non-existing connections using <code class="language-plaintext highlighter-rouge">SparseNeuron.zero_ws</code>.</p>

<p>2) For each layer obtain existing and candidate parameters.</p>

<p>3) If no parameters to update, continue with the next layer.</p>

<p>4) Pick candidate connections with highest gradient magnitude.</p>

<p>5) Pick existing connections with lowest magnitude.</p>

<p>6) Replace least magnitude connections with new ones that have high expected gradient.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">micrograd.rigl</span> <span class="kn">import</span> <span class="n">top_k_param_dict</span>     

<span class="k">def</span> <span class="nf">rigl_update_layer</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">update_fraction</span><span class="o">=</span><span class="mf">0.3</span><span class="p">):</span>
    <span class="c1"># (1) Calculating dense gradient
</span>    <span class="n">total_loss</span><span class="p">,</span> <span class="n">acc</span> <span class="o">=</span> <span class="n">loss</span><span class="p">(</span><span class="n">model</span><span class="p">,</span> <span class="n">dense_grad</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
    <span class="n">model</span><span class="p">.</span><span class="n">zero_grad</span><span class="p">()</span>
    <span class="n">total_loss</span><span class="p">.</span><span class="n">backward</span><span class="p">()</span>

    <span class="k">for</span> <span class="n">layer</span> <span class="ow">in</span> <span class="n">model</span><span class="p">.</span><span class="n">layers</span><span class="p">:</span>
        <span class="n">n_weights</span> <span class="o">=</span> <span class="mi">0</span>
        <span class="c1"># (2) For each layer obtain existing and candidate parameters.
</span>        <span class="n">candidate_params</span><span class="p">,</span> <span class="n">params</span> <span class="o">=</span> <span class="p">[],</span> <span class="p">[]</span>
        <span class="k">for</span> <span class="n">j</span><span class="p">,</span> <span class="n">neuron</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">layer</span><span class="p">.</span><span class="n">neurons</span><span class="p">):</span>
            <span class="n">n_weights</span> <span class="o">+=</span> <span class="nb">len</span><span class="p">(</span><span class="n">neuron</span><span class="p">.</span><span class="n">w</span><span class="p">)</span>
            <span class="c1"># Decide connections to grow (pick top gradient magnitude).
</span>            <span class="n">candidate_params</span><span class="p">.</span><span class="n">extend</span><span class="p">([(</span><span class="n">p</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">)</span>  <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">neuron</span><span class="p">.</span><span class="n">zero_ws</span><span class="p">.</span><span class="n">items</span><span class="p">()])</span>
            <span class="n">params</span><span class="p">.</span><span class="n">extend</span><span class="p">([(</span><span class="n">p</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">)</span>  <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">neuron</span><span class="p">.</span><span class="n">w</span><span class="p">.</span><span class="n">items</span><span class="p">()])</span>
            <span class="c1"># Done with zero_ws delete them.
</span>            <span class="n">neuron</span><span class="p">.</span><span class="n">zero_ws</span> <span class="o">=</span> <span class="p">{}</span>       

        <span class="c1"># (3) If no parameters to update, skip this layer
</span>        <span class="n">n_update</span> <span class="o">=</span> <span class="n">math</span><span class="p">.</span><span class="n">floor</span><span class="p">(</span><span class="n">n_weights</span> <span class="o">*</span> <span class="n">update_fraction</span><span class="p">)</span>
        <span class="k">if</span> <span class="n">n_update</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
            <span class="c1"># Not updating
</span>            <span class="k">continue</span>

        <span class="c1"># (4) Obtain candidate connections with highest gradient magnitude.
</span>        <span class="n">top_grad_fn</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">p</span><span class="p">:</span> <span class="nb">abs</span><span class="p">(</span><span class="n">p</span><span class="p">.</span><span class="n">grad</span><span class="p">)</span>
        <span class="n">top_k_candidate_params</span> <span class="o">=</span> <span class="n">top_k_param_dict</span><span class="p">(</span><span class="n">candidate_params</span><span class="p">,</span> <span class="n">n_update</span><span class="p">,</span> <span class="n">sort_fn</span><span class="o">=</span><span class="n">top_grad_fn</span><span class="p">)</span>
        <span class="c1"># (5) Obtain existing connections with lowest magnitude.
</span>        <span class="n">least_magnutide_fn</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">p</span><span class="p">:</span> <span class="o">-</span><span class="nb">abs</span><span class="p">(</span><span class="n">p</span><span class="p">.</span><span class="n">data</span><span class="p">)</span>
        <span class="n">bottom_k_params</span> <span class="o">=</span> <span class="n">top_k_param_dict</span><span class="p">(</span><span class="n">params</span><span class="p">,</span> <span class="n">n_update</span><span class="p">,</span> <span class="n">sort_fn</span><span class="o">=</span><span class="n">least_magnutide_fn</span><span class="p">)</span>
        <span class="c1"># (6) Replace least magnitude connections with new ones with high expected gradient.
</span>        <span class="k">for</span> <span class="p">(</span><span class="n">p</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">),</span> <span class="p">(</span><span class="n">_</span><span class="p">,</span> <span class="n">i_new</span><span class="p">,</span> <span class="n">j_new</span><span class="p">)</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">bottom_k_params</span><span class="p">,</span> <span class="n">top_k_candidate_params</span><span class="p">):</span>
            <span class="k">del</span> <span class="n">layer</span><span class="p">.</span><span class="n">neurons</span><span class="p">[</span><span class="n">j</span><span class="p">].</span><span class="n">w</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
            <span class="n">p</span><span class="p">.</span><span class="n">data</span> <span class="o">=</span> <span class="mf">0.</span>
            <span class="n">layer</span><span class="p">.</span><span class="n">neurons</span><span class="p">[</span><span class="n">j_new</span><span class="p">].</span><span class="n">w</span><span class="p">[</span><span class="n">i_new</span><span class="p">]</span> <span class="o">=</span> <span class="n">p</span>
</code></pre></div></div>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">1337</span><span class="p">)</span>
<span class="n">random</span><span class="p">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">1337</span><span class="p">)</span>
<span class="n">rigl_model</span> <span class="o">=</span> <span class="n">SparseMLP</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="p">[</span><span class="mi">16</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="p">[</span><span class="mf">0.</span><span class="p">,</span> <span class="mf">0.9</span><span class="p">,</span> <span class="mf">0.8</span><span class="p">])</span>
<span class="n">N_TOTAL</span> <span class="o">=</span> <span class="mi">400</span>
<span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">N_TOTAL</span><span class="p">):</span>    
    <span class="c1"># forward
</span>    <span class="n">total_loss</span><span class="p">,</span> <span class="n">acc</span> <span class="o">=</span> <span class="n">loss</span><span class="p">(</span><span class="n">rigl_model</span><span class="p">)</span>

    <span class="c1"># backward
</span>    <span class="n">rigl_model</span><span class="p">.</span><span class="n">zero_grad</span><span class="p">()</span>
    <span class="n">total_loss</span><span class="p">.</span><span class="n">backward</span><span class="p">()</span>

    <span class="c1"># update (sgd)
</span>    <span class="n">learning_rate</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">-</span> <span class="mf">0.9</span><span class="o">*</span><span class="n">k</span><span class="o">/</span><span class="n">N_TOTAL</span>
    <span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">rigl_model</span><span class="p">.</span><span class="n">parameters</span><span class="p">():</span>
        <span class="n">p</span><span class="p">.</span><span class="n">data</span> <span class="o">-=</span> <span class="n">learning_rate</span> <span class="o">*</span> <span class="n">p</span><span class="p">.</span><span class="n">grad</span>

    <span class="k">if</span> <span class="n">k</span> <span class="o">%</span> <span class="mi">20</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
        <span class="k">print</span><span class="p">(</span><span class="sa">f</span><span class="s">"step </span><span class="si">{</span><span class="n">k</span><span class="si">}</span><span class="s"> loss </span><span class="si">{</span><span class="n">total_loss</span><span class="p">.</span><span class="n">data</span><span class="si">}</span><span class="s">, accuracy </span><span class="si">{</span><span class="n">acc</span><span class="o">*</span><span class="mi">100</span><span class="si">}</span><span class="s">%"</span><span class="p">)</span>
        <span class="k">if</span> <span class="n">k</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">:</span> <span class="n">rigl_update_layer</span><span class="p">(</span><span class="n">rigl_model</span><span class="p">,</span> <span class="n">update_fraction</span><span class="o">=</span><span class="n">learning_rate</span><span class="o">*</span><span class="mf">0.3</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>step 0 loss 0.9999118995994747, accuracy 50.0%
step 20 loss 0.9501034046836242, accuracy 50.0%
step 40 loss 0.31005307891543676, accuracy 86.0%
step 60 loss 0.29439281503845477, accuracy 87.0%
step 80 loss 0.29081179132781054, accuracy 88.0%
step 100 loss 0.2876179472370331, accuracy 88.0%
step 120 loss 0.2844175041217564, accuracy 86.0%
step 140 loss 0.3083420441792058, accuracy 85.0%
step 160 loss 0.2796200131507172, accuracy 88.0%
step 180 loss 0.270197735972451, accuracy 89.0%
step 200 loss 0.2548698443690211, accuracy 90.0%
step 220 loss 0.2519338433937867, accuracy 90.0%
step 240 loss 0.14442011238413255, accuracy 96.0%
step 260 loss 0.22868415294455974, accuracy 96.0%
step 280 loss 0.06459693547356511, accuracy 97.0%
step 300 loss 0.061831516333362764, accuracy 100.0%
step 320 loss 0.04567886755653798, accuracy 99.0%
step 340 loss 0.04156236561415438, accuracy 100.0%
step 360 loss 0.03462738757482692, accuracy 100.0%
step 380 loss 0.031153761540187855, accuracy 100.0%
</code></pre></div></div>

<h2 id="results">Results</h2>
<p>RigL obtains 0.031 loss with 100% acc vs static training obtains 0.287 with 88% acc.</p>

<p>Visualizing the connectivity of the model after the RigL training reveals interesting insights. We see that the available connections are used by few important neurons and many neurons internal neurons are discarded.</p>

<p>For example below, neurons <code class="language-plaintext highlighter-rouge">2_3</code> and <code class="language-plaintext highlighter-rouge">2_1</code> have the most of the connections of the second layer. Other active neurons (<code class="language-plaintext highlighter-rouge">2_4</code> and <code class="language-plaintext highlighter-rouge">2_2</code>) have a single incoming connections. All the remaining neurons are dead, which means they don’t have any incoming or outgoing connections and therefore they can’t effect the output. We can safely remove dead units. Let’s do that.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">draw_topology</span><span class="p">(</span><span class="n">rigl_model</span><span class="p">,</span> <span class="n">rankdir</span><span class="o">=</span><span class="s">'TB'</span><span class="p">)</span>
</code></pre></div></div>

<p><img src="/assets/images/micrograd_sparse/output_15_0.svg" alt="svg" /></p>

<h3 id="stripping-dead-units">Stripping Dead Units</h3>
<p>Removing dead units we are left with a compact 10,4,1 (compared to 16,16,1) architecture.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">micrograd.rigl</span> <span class="kn">import</span> <span class="n">strip_deadneurons</span>     
<span class="n">compressed_rigl_model</span> <span class="o">=</span> <span class="n">strip_deadneurons</span><span class="p">(</span><span class="n">rigl_model</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">compressed_rigl_model</span><span class="p">)</span>
<span class="n">draw_topology</span><span class="p">(</span><span class="n">compressed_rigl_model</span><span class="p">,</span> <span class="n">rankdir</span><span class="o">=</span><span class="s">'TB'</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>MLP of [
Layer of [ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2), ReLUNeuron(2)]
Layer of [ReLUNeuron(5), ReLUNeuron(1), ReLUNeuron(7), ReLUNeuron(1)]
Layer of [LinearNeuron(4)]
]
</code></pre></div></div>

<p><img src="/assets/images/micrograd_sparse/output_17_1.svg" alt="svg" /></p>

<h1 id="visualizing-the-decision-boundries">Visualizing the decision boundries</h1>
<p>Let’s (yet again) use Andrej’s code to visualize the decision boundaries.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># visualize decision boundary
</span>
<span class="n">h</span> <span class="o">=</span> <span class="mf">0.25</span>
<span class="n">x_min</span><span class="p">,</span> <span class="n">x_max</span> <span class="o">=</span> <span class="n">X</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">].</span><span class="nb">min</span><span class="p">()</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="n">X</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">].</span><span class="nb">max</span><span class="p">()</span> <span class="o">+</span> <span class="mi">1</span>
<span class="n">y_min</span><span class="p">,</span> <span class="n">y_max</span> <span class="o">=</span> <span class="n">X</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">].</span><span class="nb">min</span><span class="p">()</span> <span class="o">-</span> <span class="mi">1</span><span class="p">,</span> <span class="n">X</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">].</span><span class="nb">max</span><span class="p">()</span> <span class="o">+</span> <span class="mi">1</span>
<span class="n">xx</span><span class="p">,</span> <span class="n">yy</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">meshgrid</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="n">x_min</span><span class="p">,</span> <span class="n">x_max</span><span class="p">,</span> <span class="n">h</span><span class="p">),</span>
                     <span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="n">y_min</span><span class="p">,</span> <span class="n">y_max</span><span class="p">,</span> <span class="n">h</span><span class="p">))</span>
<span class="n">Xmesh</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">c_</span><span class="p">[</span><span class="n">xx</span><span class="p">.</span><span class="n">ravel</span><span class="p">(),</span> <span class="n">yy</span><span class="p">.</span><span class="n">ravel</span><span class="p">()]</span>
<span class="n">Zs</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">inputs</span> <span class="o">=</span> <span class="p">[</span><span class="nb">list</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="n">Value</span><span class="p">,</span> <span class="n">xrow</span><span class="p">))</span> <span class="k">for</span> <span class="n">xrow</span> <span class="ow">in</span> <span class="n">Xmesh</span><span class="p">]</span>
<span class="k">for</span> <span class="n">m</span> <span class="ow">in</span> <span class="p">[</span><span class="n">static_sparse_model</span><span class="p">,</span> <span class="n">rigl_model</span><span class="p">,</span> <span class="n">compressed_rigl_model</span><span class="p">]:</span>
  <span class="n">scores</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="n">inputs</span><span class="p">))</span>
  <span class="n">Z</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="n">s</span><span class="p">.</span><span class="n">data</span> <span class="o">&gt;</span> <span class="mi">0</span> <span class="k">for</span> <span class="n">s</span> <span class="ow">in</span> <span class="n">scores</span><span class="p">])</span>
  <span class="n">Zs</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">Z</span><span class="p">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">xx</span><span class="p">.</span><span class="n">shape</span><span class="p">))</span>

<span class="n">fig</span><span class="p">,</span> <span class="n">axs</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="n">subplots</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">15</span><span class="p">,</span> <span class="mi">5</span><span class="p">))</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">key</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">([</span><span class="s">'Static model'</span><span class="p">,</span> <span class="s">'RigL model'</span><span class="p">,</span> <span class="s">'Compressed RigL model'</span><span class="p">]):</span>
  <span class="n">plt</span><span class="p">.</span><span class="n">axes</span><span class="p">(</span><span class="n">axs</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
  <span class="n">plt</span><span class="p">.</span><span class="n">contourf</span><span class="p">(</span><span class="n">xx</span><span class="p">,</span> <span class="n">yy</span><span class="p">,</span> <span class="n">Zs</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">cmap</span><span class="o">=</span><span class="n">plt</span><span class="p">.</span><span class="n">cm</span><span class="p">.</span><span class="n">Spectral</span><span class="p">,</span> <span class="n">alpha</span><span class="o">=</span><span class="mf">0.8</span><span class="p">)</span>
  <span class="n">plt</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">X</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">],</span> <span class="n">X</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">],</span> <span class="n">c</span><span class="o">=</span><span class="n">y</span><span class="p">,</span> <span class="n">s</span><span class="o">=</span><span class="mi">40</span><span class="p">,</span> <span class="n">cmap</span><span class="o">=</span><span class="n">plt</span><span class="p">.</span><span class="n">cm</span><span class="p">.</span><span class="n">Spectral</span><span class="p">)</span>
  <span class="n">plt</span><span class="p">.</span><span class="n">xlim</span><span class="p">(</span><span class="n">xx</span><span class="p">.</span><span class="nb">min</span><span class="p">(),</span> <span class="n">xx</span><span class="p">.</span><span class="nb">max</span><span class="p">())</span>
  <span class="n">plt</span><span class="p">.</span><span class="n">ylim</span><span class="p">(</span><span class="n">yy</span><span class="p">.</span><span class="nb">min</span><span class="p">(),</span> <span class="n">yy</span><span class="p">.</span><span class="nb">max</span><span class="p">())</span>
  <span class="n">plt</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="n">key</span><span class="p">)</span>

</code></pre></div></div>

<p><img src="/assets/images/micrograd_sparse/output_19_0.png" alt="png" /></p>

<h3 id="acknowledgements">Acknowledgements</h3>
<p>Most of this notebook is based on Andrej Karpathy’s original demo at <a href="https://github.com/karpathy/micrograd">micrograd</a> repo. So I would like to thank Andrej for open-sourcing and sharing his code.</p>]]></content><author><name>Utku Evci</name><email>ue[224+1]@nyu.edu</email></author><category term="ml" /><category term="python" /><category term="ml" /><summary type="html"><![CDATA[Implementing sparse networks using neurons as building blocks.]]></summary></entry><entry><title type="html">June Leet Code Selection</title><link href="https://evcu.github.io/algorithms/leet4/" rel="alternate" type="text/html" title="June Leet Code Selection" /><published>2019-06-01T00:00:00-04:00</published><updated>2019-06-01T00:00:00-04:00</updated><id>https://evcu.github.io/algorithms/leet4</id><content type="html" xml:base="https://evcu.github.io/algorithms/leet4/"><![CDATA[<h3 id="leet-code-practice-selection">Leet Code Practice Selection</h3>
<h4 id="32-longest-valid-parentheses">32. <a href="https://leetcode.com/problems/longest-valid-parentheses">Longest Valid Parentheses</a></h4>
<p>Given a string containing just the characters ‘(‘ and ‘)’, find the length of the longest valid (well-formed) parentheses substring.</p>

<p><strong>Solution</strong></p>
<ul>
  <li>I took me a while to solve this question. I actually failed solving, only
then looked at the solutions and rewrote the two approach.</li>
  <li>My failed attempt was on counting the paranthesis. A valid substring will
have a 0-sum over left right paranthesis. I jumped into coding and failed.
This is not a necessary condition. <code class="language-plaintext highlighter-rouge">)(</code> is not valid but has zero sum.</li>
  <li>I think the key for solving the problem is to understand the structure in the
solution. A valid substring can not end with <code class="language-plaintext highlighter-rouge">(</code>. If it ends with <code class="language-plaintext highlighter-rouge">)</code> there should
be a matching paranthesis before.</li>
  <li><strong>Solution 1</strong>: Use a stack to keep track of elements seen. Start from the beginning
Whenever we see <code class="language-plaintext highlighter-rouge">)</code> look at the top of the stack and if it matches pop it. For
example <code class="language-plaintext highlighter-rouge">(()</code> this would end up with single element in stack <code class="language-plaintext highlighter-rouge">(</code>. To determine
the size we will use the index of current element from the last elements in the stack.
If there is nothing in the stack, this means we match everything before and therefore
the length is <code class="language-plaintext highlighter-rouge">i+1</code>.</li>
  <li><strong>Solution 2</strong>: This solution uses an array <code class="language-plaintext highlighter-rouge">longest[i]</code> which gives the
longest substring seen until element i. Since a substring can only end with <code class="language-plaintext highlighter-rouge">)</code>
we only process those indices. There are 2 conditions for a substring to end at
index i. (1) longest[i-1]==0 and s[i-1]==’(‘ (2) longest[i-1]!=0 and
s[i-longest[i-1]-1]==’(‘. Note that just checking <code class="language-plaintext highlighter-rouge">s[i-longest[i-1]-1]=='('</code>
satisfies both conditions as long as the difference is a valid index. If there
is a substring ending at i, only thing left is calculating the length and
updating the max.</li>
</ul>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">longestValidParentheses</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">s</span><span class="p">):</span>
    <span class="s">"""
    :type s: str
    :rtype: int
    """</span>
    <span class="n">stack</span> <span class="o">=</span> <span class="p">[]</span>
    <span class="n">c_max</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">s</span><span class="p">)):</span>
        <span class="k">if</span> <span class="n">s</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">==</span> <span class="s">')'</span> <span class="ow">and</span> <span class="n">stack</span> <span class="ow">and</span> <span class="n">s</span><span class="p">[</span><span class="n">stack</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]]</span> <span class="o">==</span> <span class="s">'('</span><span class="p">:</span>
            <span class="n">stack</span><span class="p">.</span><span class="n">pop</span><span class="p">()</span>
            <span class="n">start</span> <span class="o">=</span> <span class="n">stack</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="k">if</span> <span class="n">stack</span> <span class="k">else</span> <span class="o">-</span><span class="mi">1</span>
            <span class="n">c_max</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">c_max</span><span class="p">,</span> <span class="n">i</span> <span class="o">-</span> <span class="n">start</span><span class="p">)</span>
        <span class="k">else</span><span class="p">:</span>
            <span class="n">stack</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">c_max</span>

<span class="k">def</span> <span class="nf">longestValidParentheses2</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">s</span><span class="p">):</span>
    <span class="s">"""
    :type s: str
    :rtype: int
    """</span>
    <span class="n">longest</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">*</span><span class="nb">len</span><span class="p">(</span><span class="n">s</span><span class="p">)</span>
    <span class="n">c_max</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">s</span><span class="p">)):</span>
        <span class="k">if</span> <span class="n">s</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">==</span> <span class="s">')'</span><span class="p">:</span>
            <span class="k">if</span> <span class="p">(</span><span class="n">i</span><span class="o">-</span><span class="n">longest</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span><span class="o">&gt;=</span><span class="mi">0</span> <span class="ow">and</span> <span class="n">s</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="n">longest</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">==</span> <span class="s">'('</span><span class="p">:</span>
                <span class="n">longest</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="mi">2</span> <span class="o">+</span> <span class="n">longest</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
                <span class="k">if</span> <span class="p">(</span><span class="n">i</span><span class="o">-</span><span class="n">longest</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="o">-</span><span class="mi">2</span><span class="p">)</span><span class="o">&gt;</span><span class="mi">0</span><span class="p">:</span>
                    <span class="n">longest</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+=</span> <span class="n">longest</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="n">longest</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="o">-</span><span class="mi">2</span><span class="p">]</span>
                <span class="n">c_max</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="n">c_max</span><span class="p">,</span> <span class="n">longest</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
    <span class="k">return</span> <span class="n">c_max</span>
</code></pre></div></div>

<h4 id="496-next-greater-element-i">496. <a href="https://leetcode.com/problems/next-greater-element-i/submissions/">Next Greater Element I</a></h4>

<p>You are given two arrays (without duplicates) nums1 and nums2 where nums1’s elements are subset of nums2. Find all the next greater numbers for nums1’s elements in the corresponding places of nums2.</p>

<p>The Next Greater Number of a number x in nums1 is the first greater number to its right in nums2. If it does not exist, output -1 for this number.</p>

<p><strong>Solution</strong></p>
<ul>
  <li>We will sweep through the elements of the second array and whenever we
encounter an element from first array we add it to the stack.</li>
  <li>During our sweep we will also check whether it greater than any element in the stack.
If so we have our next big element. Note that the elements in stack must be decreasing.
    <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">nextGreaterElement</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">nums1</span><span class="p">,</span> <span class="n">nums2</span><span class="p">):</span>
  <span class="s">"""
  :type nums1: List[int]
  :type nums2: List[int]
  :rtype: List[int]
  """</span>
  <span class="n">stack</span> <span class="o">=</span> <span class="p">[]</span>
  <span class="n">set_nums1</span> <span class="o">=</span> <span class="nb">set</span><span class="p">(</span><span class="n">nums1</span><span class="p">)</span>
  <span class="n">res_dict</span> <span class="o">=</span> <span class="p">{}</span>
  <span class="k">for</span> <span class="n">el</span> <span class="ow">in</span> <span class="n">nums2</span><span class="p">:</span>
      <span class="k">while</span> <span class="n">stack</span> <span class="ow">and</span> <span class="n">stack</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="o">&lt;</span><span class="n">el</span><span class="p">:</span>
          <span class="n">res_dict</span><span class="p">[</span><span class="n">stack</span><span class="p">.</span><span class="n">pop</span><span class="p">()]</span> <span class="o">=</span> <span class="n">el</span>
      <span class="k">if</span> <span class="n">el</span> <span class="ow">in</span> <span class="n">set_nums1</span><span class="p">:</span>
          <span class="n">stack</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">el</span><span class="p">)</span>
  <span class="k">return</span> <span class="p">[</span><span class="n">res_dict</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="n">el</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="k">for</span> <span class="n">el</span> <span class="ow">in</span> <span class="n">nums1</span><span class="p">]</span>
</code></pre></div>    </div>
  </li>
</ul>

<h4 id="189-rotate-array">189. <a href="https://leetcode.com/problems/rotate-array">Rotate Array</a></h4>
<p>Given an array, rotate the array to the right by k steps, where k is non-negative.</p>

<p><strong>Solution</strong></p>
<ul>
  <li>Starting with taking the modulo so that we don’t do extra rotation.</li>
  <li>Let’s do inplace with constant space. For that we will cycle through until
we end up in the index we started.</li>
  <li>There might be more than one cycles. So we increment the starting point by one
and repeat until our assign counter ticks the length of the array. Then, we stop
    <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">rotate</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">nums</span><span class="p">,</span> <span class="n">k</span><span class="p">):</span>
  <span class="s">"""
  :type nums: List[int]
  :type k: int
  :rtype: None Do not return anything, modify nums in-place instead.
  """</span>
  <span class="n">k</span> <span class="o">=</span> <span class="n">k</span> <span class="o">%</span> <span class="nb">len</span><span class="p">(</span><span class="n">nums</span><span class="p">)</span>
  <span class="k">if</span> <span class="n">k</span><span class="o">==</span><span class="mi">0</span><span class="p">:</span>
      <span class="k">return</span> <span class="n">nums</span>
  <span class="n">assign_counter</span> <span class="o">=</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span>
  <span class="k">while</span> <span class="n">assign_counter</span> <span class="o">!=</span> <span class="nb">len</span><span class="p">(</span><span class="n">nums</span><span class="p">):</span>
      <span class="n">temp</span> <span class="o">=</span> <span class="n">nums</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
      <span class="n">prev_j</span><span class="p">,</span> <span class="n">j</span> <span class="o">=</span> <span class="n">i</span><span class="p">,</span> <span class="p">(</span><span class="n">i</span> <span class="o">-</span> <span class="n">k</span><span class="p">)</span> <span class="o">%</span> <span class="nb">len</span><span class="p">(</span><span class="n">nums</span><span class="p">)</span>
      <span class="k">while</span> <span class="n">j</span> <span class="o">!=</span> <span class="n">i</span><span class="p">:</span>
          <span class="n">nums</span><span class="p">[</span><span class="n">prev_j</span><span class="p">]</span> <span class="o">=</span> <span class="n">nums</span><span class="p">[</span><span class="n">j</span><span class="p">]</span>
          <span class="n">assign_counter</span> <span class="o">+=</span> <span class="mi">1</span>
          <span class="n">prev_j</span><span class="p">,</span> <span class="n">j</span> <span class="o">=</span> <span class="n">j</span><span class="p">,</span> <span class="p">(</span><span class="n">j</span> <span class="o">-</span> <span class="n">k</span><span class="p">)</span> <span class="o">%</span> <span class="nb">len</span><span class="p">(</span><span class="n">nums</span><span class="p">)</span>
      <span class="n">nums</span><span class="p">[</span><span class="n">prev_j</span><span class="p">]</span> <span class="o">=</span> <span class="n">temp</span>
      <span class="n">assign_counter</span> <span class="o">+=</span> <span class="mi">1</span>
      <span class="n">i</span> <span class="o">+=</span> <span class="mi">1</span>
</code></pre></div>    </div>
  </li>
</ul>]]></content><author><name>Utku Evci</name><email>ue[224+1]@nyu.edu</email></author><category term="algorithms" /><category term="python" /><summary type="html"><![CDATA[Weekend spend on coding practices, some of the selected questions...]]></summary></entry><entry><title type="html">April Leet Code Selection</title><link href="https://evcu.github.io/algorithms/leet3/" rel="alternate" type="text/html" title="April Leet Code Selection" /><published>2019-04-15T00:00:00-04:00</published><updated>2019-04-15T00:00:00-04:00</updated><id>https://evcu.github.io/algorithms/leet3</id><content type="html" xml:base="https://evcu.github.io/algorithms/leet3/"><![CDATA[<h3 id="leet-code-practice-selection">Leet Code Practice Selection</h3>
<h4 id="335-self-crossing">335. <a href="https://leetcode.com/problems/self-crossing/">Self Crossing</a></h4>
<p>You are given an array x of n positive numbers. You start at point (0,0) and moves x[0] metres to the north, then x[1] metres to the west, x[2] metres to the south, x[3] metres to the east and so on. In other words, after each move your direction changes counter-clockwise.</p>

<p>Write a one-pass algorithm with O(1) extra space to determine, if your path crosses itself, or not.</p>

<p><strong>Solution</strong></p>
<ul>
  <li>First, I tried thinking of ways the path can keep turning. It can grow
 indefinitely (when south&gt;north and east&gt;west). It can shrink for a while
 with following (south&lt;north and east&lt;west). It can also grow for a while and
 then shrink. Once it start shrinking that’s the path towards end. Either it
 will cross or it will stop. So my first idea was to follow this two states and
 ensure they are happening. I didn’t pursue this line of thinking.</li>
  <li>It is important that all values are positive so the line keeps turning left.
You can also rotate your point of view so that every move is a north move.
This will help writing the recursion.</li>
  <li>Later I realized that validity of current move depends only on the last
5 moves and wrote the conditions where there is no cut happening. (First I
thought the current move only depends 4, then had to add a flag for the last
move). The answer is O(1) space and O(N) run time.</li>
  <li>After writing the solution realized that a better solution would be write the
recursion answering the question: <code class="language-plaintext highlighter-rouge">given past moves(5) does next move cut?</code>. This
could make the solution shorter and more legible.
    <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Solution</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
  <span class="k">def</span> <span class="nf">isSelfCrossing</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
      <span class="s">"""
      :type x: List[int]
      :rtype: bool
      """</span>
      <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="o">&lt;</span><span class="mi">5</span><span class="p">:</span>
          <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="o">==</span><span class="mi">4</span><span class="p">:</span>
              <span class="n">x1</span><span class="p">,</span> <span class="n">x2</span><span class="p">,</span> <span class="n">x3</span><span class="p">,</span> <span class="n">x4</span> <span class="o">=</span> <span class="n">x</span>
              <span class="n">rules</span> <span class="o">=</span> <span class="p">[(</span><span class="n">x1</span><span class="o">&gt;=</span><span class="n">x3</span> <span class="ow">and</span> <span class="n">x2</span><span class="o">&gt;</span><span class="n">x4</span><span class="p">),</span>
                       <span class="n">x3</span><span class="o">&gt;</span><span class="n">x1</span><span class="p">]</span>
              <span class="k">return</span> <span class="ow">not</span> <span class="nb">any</span><span class="p">(</span><span class="n">rules</span><span class="p">)</span>
          <span class="k">else</span><span class="p">:</span>
              <span class="k">return</span> <span class="bp">False</span>
      <span class="n">last_lim</span> <span class="o">=</span> <span class="bp">None</span>
      <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="o">-</span><span class="mi">4</span><span class="p">):</span>
          <span class="c1"># print(x[i:i+5], last_lim)
</span>          <span class="n">x1</span><span class="p">,</span> <span class="n">x2</span><span class="p">,</span> <span class="n">x3</span><span class="p">,</span> <span class="n">x4</span><span class="p">,</span> <span class="n">x5</span> <span class="o">=</span> <span class="n">x</span><span class="p">[</span><span class="n">i</span><span class="p">:</span><span class="n">i</span><span class="o">+</span><span class="mi">5</span><span class="p">]</span>
          <span class="n">safe_moves</span> <span class="o">=</span> <span class="p">[(</span><span class="n">x1</span><span class="o">&gt;=</span><span class="n">x3</span> <span class="ow">and</span> <span class="n">x2</span><span class="o">&gt;</span><span class="n">x4</span><span class="p">),</span>
                        <span class="p">(</span><span class="n">x3</span><span class="o">&gt;</span><span class="n">x1</span> <span class="ow">and</span> <span class="n">x2</span><span class="o">&gt;</span><span class="n">x4</span> <span class="ow">and</span> <span class="n">x5</span><span class="o">&lt;</span><span class="n">x3</span><span class="p">),</span>
                        <span class="p">(</span><span class="n">x3</span><span class="o">&gt;</span><span class="n">x1</span> <span class="ow">and</span> <span class="n">x2</span><span class="o">==</span><span class="n">x4</span> <span class="ow">and</span> <span class="p">(</span><span class="n">x5</span><span class="o">+</span><span class="n">x1</span><span class="p">)</span><span class="o">&lt;</span><span class="n">x3</span><span class="p">),</span>
                        <span class="p">(</span><span class="n">x3</span><span class="o">&gt;</span><span class="n">x1</span> <span class="ow">and</span> <span class="n">x2</span><span class="o">&lt;</span><span class="n">x4</span><span class="p">)]</span>
          <span class="c1"># print(safe_moves)
</span>          <span class="k">if</span> <span class="p">(</span><span class="ow">not</span> <span class="nb">any</span><span class="p">(</span><span class="n">safe_moves</span><span class="p">)</span>
              <span class="ow">or</span> <span class="p">(</span><span class="n">last_lim</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span> <span class="ow">and</span> <span class="n">x5</span><span class="o">&gt;=</span><span class="n">last_lim</span><span class="p">)):</span>
              <span class="k">return</span> <span class="bp">True</span>
          <span class="c1"># Reset last_lim
</span>          <span class="n">last_lim</span> <span class="o">=</span> <span class="bp">None</span>
          <span class="k">if</span> <span class="p">(</span><span class="n">x3</span><span class="o">&gt;</span><span class="n">x1</span> <span class="ow">and</span> <span class="n">x2</span><span class="o">&lt;</span><span class="n">x4</span> <span class="ow">and</span> <span class="n">x5</span><span class="o">&lt;=</span><span class="n">x3</span> <span class="ow">and</span> <span class="n">x5</span><span class="o">&gt;=</span><span class="p">(</span><span class="n">x3</span><span class="o">-</span><span class="n">x1</span><span class="p">)):</span>
              <span class="n">last_lim</span> <span class="o">=</span> <span class="n">x4</span><span class="o">-</span><span class="n">x2</span>
      <span class="k">return</span> <span class="bp">False</span>
</code></pre></div>    </div>
  </li>
</ul>

<h4 id="692-top-k-frequent-words">692. <a href="https://leetcode.com/problems/top-k-frequent-words/">Top K Frequent Words</a></h4>

<p>Given a non-empty list of words, return the k most frequent elements.</p>

<p>Your answer should be sorted by frequency from highest to lowest. If two words have the same frequency, then the word with the lower alphabetical order comes first.</p>

<p><strong>Solution</strong></p>
<ul>
  <li>Seems simple! It is, indeed. Especially with the <code class="language-plaintext highlighter-rouge">collections.Counter</code>.</li>
  <li>One important observation is that the we want to get highest count words with
lowest string value. So, they go in different direction. Since the counts are
never zero, we can negate them to make the direction same. Now we can order in
increasing order in both word and count. We create tuples so that the negative
count comes first (that’s what we care first) and the word itself second to
break the ties.</li>
  <li>Now we can just use <code class="language-plaintext highlighter-rouge">sorted</code> which by default sort by the first element and
go to the next one. We would return the first k words.
    <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">topKFrequent</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">words</span><span class="p">,</span> <span class="n">k</span><span class="p">):</span>
  <span class="s">"""
  :type words: List[str]
  :type k: int
  :rtype: List[str]
  """</span>
  <span class="kn">from</span> <span class="nn">collections</span> <span class="kn">import</span> <span class="n">Counter</span>
  <span class="kn">from</span> <span class="nn">operator</span> <span class="kn">import</span> <span class="n">itemgetter</span>
  <span class="n">unsorted_counter</span> <span class="o">=</span> <span class="p">((</span><span class="o">-</span><span class="n">v</span><span class="p">,</span> <span class="n">k</span><span class="p">)</span> <span class="k">for</span> <span class="n">k</span><span class="p">,</span><span class="n">v</span> <span class="ow">in</span> <span class="n">Counter</span><span class="p">(</span><span class="n">words</span><span class="p">).</span><span class="n">items</span><span class="p">())</span>
  <span class="k">return</span> <span class="nb">list</span><span class="p">(</span><span class="n">v</span> <span class="k">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">v</span> <span class="ow">in</span> <span class="nb">sorted</span><span class="p">(</span><span class="n">unsorted_counter</span><span class="p">))[:</span><span class="n">k</span><span class="p">]</span>
</code></pre></div>    </div>
  </li>
</ul>

<h4 id="51-n-queens">51. <a href="https://leetcode.com/problems/n-queens/">N-Queens</a></h4>
<p>The n-queens puzzle is the problem of placing n queens on an n×n chessboard such that no two queens attack each other.</p>

<p>Given an integer n, return all distinct solutions to the n-queens puzzle.</p>

<p>Each solution contains a distinct board configuration of the n-queens’ placement, where ‘Q’ and ‘.’ both indicate a queen and an empty space respectively.</p>

<p><strong>Solution</strong></p>
<ul>
  <li>N-queen problem has a simple backtracking solution. Where you start from
first column and advance to the next column one by one. At each step changing
the state and recurse to the next stage. After the call you would remove the
move and try another move if possible. It is also possible to advance 1 row
at a time, but we will stick into putting queens to the columns starting from
the left most column.</li>
  <li><strong>State of a n-board</strong>: We can represent a the solution to a n-queen problem
with n<em>n matrix. However there would be bunch of zeros since we would have only
n/n^2 non-zero elements in it. Here I would propose a different representation,
where we color the board witch each move in 4 directions: row, column, right
diagonal and left diagonal. There are 2</em>n-1 many diagonal in each direction and
placing a queen to location (i, j) on the board would correspond to painting
row i, column j and right diagonal i+j and left diagonal n+i-j-1. And the catch
is we can only place a column when all 4 element corresponding to the move (i,j)
is zero. If we make a move, we set the corresponding these elements to 1.</li>
  <li>Being able to reach to the last column and to make a move there means a
solution has been found, so we will return the row of the last move. Each
recursive call that returns a non-empty list would add its move(<code class="language-plaintext highlighter-rouge">j</code>) to the each
element of the list and append to the <code class="language-plaintext highlighter-rouge">results</code>’s found so far on a particular
call. So at the end when j=0 the list <code class="language-plaintext highlighter-rouge">results</code> would have solutions with size
<strong>n</strong>.
    <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Solution</span><span class="p">(</span><span class="nb">object</span><span class="p">):</span>
  <span class="k">def</span> <span class="nf">solveNQueens</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">n</span><span class="p">):</span>
      <span class="s">"""
      :type n: int
      :rtype: List[List[str]]
      """</span>
      <span class="n">rows</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">*</span><span class="n">n</span>
      <span class="n">r_diag</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">*</span><span class="p">(</span><span class="mi">2</span><span class="o">*</span><span class="n">n</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span>
      <span class="n">l_diag</span> <span class="o">=</span> <span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">*</span><span class="p">(</span><span class="mi">2</span><span class="o">*</span><span class="n">n</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span>
      <span class="k">def</span> <span class="nf">place_col</span><span class="p">(</span><span class="n">i</span><span class="p">):</span>
          <span class="n">result</span> <span class="o">=</span> <span class="p">[]</span>
          <span class="k">for</span> <span class="n">j</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
              <span class="k">if</span> <span class="n">rows</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">==</span> <span class="mi">0</span> <span class="ow">and</span> <span class="n">r_diag</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="n">j</span><span class="p">]</span> <span class="o">==</span> <span class="mi">0</span> <span class="ow">and</span> <span class="n">l_diag</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="n">j</span><span class="o">+</span><span class="n">n</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
                  <span class="k">if</span> <span class="n">i</span> <span class="o">==</span> <span class="p">(</span><span class="n">n</span><span class="o">-</span><span class="mi">1</span><span class="p">):</span>
                      <span class="k">return</span> <span class="p">[[</span><span class="n">j</span><span class="p">]]</span>
                  <span class="n">rows</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">r_diag</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">l_diag</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="n">j</span><span class="o">+</span><span class="n">n</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>
                  <span class="c1"># Recurse
</span>                  <span class="n">res</span> <span class="o">=</span> <span class="n">place_col</span><span class="p">(</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span>
                  <span class="k">for</span> <span class="n">ls</span> <span class="ow">in</span> <span class="n">res</span><span class="p">:</span>
                      <span class="n">ls</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">j</span><span class="p">)</span>
                      <span class="n">result</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">ls</span><span class="p">)</span>
                  <span class="c1"># Backtrack
</span>                  <span class="n">rows</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">r_diag</span><span class="p">[</span><span class="n">i</span><span class="o">+</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="n">l_diag</span><span class="p">[</span><span class="n">i</span><span class="o">-</span><span class="n">j</span><span class="o">+</span><span class="n">n</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span>
          <span class="k">return</span> <span class="n">result</span>
      <span class="n">result</span> <span class="o">=</span> <span class="n">place_col</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>

      <span class="n">f_q</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">i</span><span class="p">:</span> <span class="s">'.'</span><span class="o">*</span><span class="n">i</span> <span class="o">+</span> <span class="s">"Q"</span> <span class="o">+</span> <span class="s">'.'</span><span class="o">*</span><span class="nb">max</span><span class="p">(</span><span class="n">n</span><span class="o">-</span><span class="n">i</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
      <span class="k">return</span> <span class="nb">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">l</span><span class="p">:</span> <span class="p">[</span><span class="n">f_q</span><span class="p">(</span><span class="n">e</span><span class="p">)</span> <span class="k">for</span> <span class="n">e</span> <span class="ow">in</span> <span class="n">l</span><span class="p">],</span> <span class="n">result</span><span class="p">)</span>
</code></pre></div>    </div>
  </li>
</ul>]]></content><author><name>Utku Evci</name><email>ue[224+1]@nyu.edu</email></author><category term="algorithms" /><category term="python" /><summary type="html"><![CDATA[Leet code is fun! Back to practicing some coding questions.]]></summary></entry><entry><title type="html">How Auto-grad works? Creating a PyTorch style Auto-grad framework</title><link href="https://evcu.github.io/ml/autograd/" rel="alternate" type="text/html" title="How Auto-grad works? Creating a PyTorch style Auto-grad framework" /><published>2018-06-06T00:00:00-04:00</published><updated>2018-06-06T00:00:00-04:00</updated><id>https://evcu.github.io/ml/autograd</id><content type="html" xml:base="https://evcu.github.io/ml/autograd/"><![CDATA[<h2 id="basic-idea-and-an-overview">Basic idea and an Overview</h2>
<p>In this post I aim to motivate and show how to write an automatic differentiation library. There are various strategies to perform automatic differentiation and they each have different strengths and weaknesses. For a an overview of various methods used please refer to [1]. Py-Torch uses a graph based automatic differentiation.</p>

<p>Every operation performed on tensors can be shown as a DAG (directed acylic graph). In the case of neural networks, the loss value calculated for a given mini-batch is the last node of the graph. Chain rule is very powerful and yet a very simple rule. Thinking in terms of the DAG, what chain rule tells us to take the derivative on a node if the output gradient of the node is completely accumulated. If we somehow make each node in this graph to remember its parents. We can run a topological sort on the DAG and call the derivative function of the nodes in this order. That’s a very simple overview of how auto-grad in <a href="https://pytorch.org/">PyTorch</a> works and it is very simple to implement! Let’s do it.</p>

<h2 id="goal-and-roadmap">Goal and Roadmap</h2>
<p>We should be able to use our framework to do the following:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">l1</span> <span class="o">=</span> <span class="n">Variable</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="o">-</span><span class="mi">4</span><span class="p">,</span><span class="mi">4</span><span class="p">).</span><span class="n">reshape</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="mi">4</span><span class="p">))</span>
<span class="n">l2</span> <span class="o">=</span> <span class="n">Variable</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="o">-</span><span class="mi">2</span><span class="p">,</span><span class="mi">2</span><span class="p">).</span><span class="n">reshape</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span>
<span class="n">n1</span> <span class="o">=</span> <span class="n">dot</span><span class="p">(</span><span class="n">l1</span><span class="p">,</span><span class="n">l2</span><span class="p">)</span>
<span class="n">n2</span> <span class="o">=</span> <span class="n">relu</span><span class="p">(</span><span class="n">n1</span><span class="p">)</span>
<span class="n">n3</span> <span class="o">=</span> <span class="n">sumel</span><span class="p">(</span><span class="n">n2</span><span class="p">)</span>
<span class="n">backward_graph</span><span class="p">(</span><span class="n">n2</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">l1</span><span class="p">.</span><span class="n">grad</span><span class="p">)</span>
<span class="c1"># [[-2. -1.  0.  1.]
#  [-2. -1.  0.  1.]]
</span><span class="k">print</span><span class="p">(</span><span class="n">l2</span><span class="p">.</span><span class="n">grad</span><span class="p">)</span>
<span class="c1"># [[-4.]
#  [-2.]
#  [ 0.]
#  [ 2.]]
</span></code></pre></div></div>
<p>So we need the following:</p>

<ul>
  <li>Define a <code class="language-plaintext highlighter-rouge">Variable</code> class wrapping the numpy ndarray, that supports backward call and points its parent <code class="language-plaintext highlighter-rouge">Variable</code>s. Use this class whenever you create a new tensor. If a <code class="language-plaintext highlighter-rouge">Varible</code> is a leaf node then we don’t need the <code class="language-plaintext highlighter-rouge">backward_fun</code>.</li>
  <li>Define operations you need (<code class="language-plaintext highlighter-rouge">plus</code>,<code class="language-plaintext highlighter-rouge">minus</code>,<code class="language-plaintext highlighter-rouge">dot</code> etc..), which takes <code class="language-plaintext highlighter-rouge">Variable</code>/s as argument/s and return a new <code class="language-plaintext highlighter-rouge">Variable</code> with the right <code class="language-plaintext highlighter-rouge">backward</code> function. <code class="language-plaintext highlighter-rouge">backward</code> function should be able to pass the output gradient to its parents by calculating the gradient of its parents from the output gradient.</li>
  <li>We should be able to call <code class="language-plaintext highlighter-rouge">backward_graph</code> on every Variable which calls the backward function on <code class="language-plaintext highlighter-rouge">Variable</code>s according to the topological sort of the computation graph of the given <code class="language-plaintext highlighter-rouge">Variable</code> resulting the gradients accumulated inside each Variable.</li>
</ul>

<h2 id="implementing-variable-class">Implementing <code class="language-plaintext highlighter-rouge">Variable</code> class</h2>
<p>Each <code class="language-plaintext highlighter-rouge">Variable</code> need its data which is a scalar or a <code class="language-plaintext highlighter-rouge">numpy.ndarray</code> if it is not a leaf node we need the backward_fun. <code class="language-plaintext highlighter-rouge">__counter</code> is an internal counter for debugging purposes. <code class="language-plaintext highlighter-rouge">self.prev</code> is an array pointing the parents and initialized as an empty array: should be set manually after creation. Backward function is called on the <code class="language-plaintext highlighter-rouge">self.grad</code> so we should guarantee that it is fully accumulated before calling the <code class="language-plaintext highlighter-rouge">backward</code> on the <code class="language-plaintext highlighter-rouge">Variable</code>.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Variable</span><span class="p">():</span>
    <span class="n">__counter</span> <span class="o">=</span> <span class="mi">0</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span><span class="n">data</span><span class="p">,</span><span class="n">is_leaf</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span><span class="n">backward_fun</span><span class="o">=</span><span class="bp">None</span><span class="p">):</span>
        <span class="k">if</span> <span class="n">backward_fun</span> <span class="ow">is</span> <span class="bp">None</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">is_leaf</span><span class="p">:</span>
            <span class="k">raise</span> <span class="nb">ValueError</span><span class="p">(</span><span class="s">'non leaf nodes require backward_fun'</span><span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="nb">id</span> <span class="o">=</span> <span class="n">Variable</span><span class="p">.</span><span class="n">__counter</span>
        <span class="n">Variable</span><span class="p">.</span><span class="n">__counter</span> <span class="o">+=</span> <span class="mi">1</span>

        <span class="bp">self</span><span class="p">.</span><span class="n">is_leaf</span> <span class="o">=</span> <span class="n">is_leaf</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">prev</span> <span class="o">=</span> <span class="p">[]</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">backward_fun</span> <span class="o">=</span> <span class="n">backward_fun</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">data</span> <span class="o">=</span> <span class="n">data</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">grad</span> <span class="o">=</span> <span class="mi">0</span>

    <span class="k">def</span> <span class="nf">backward</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">backward_fun</span><span class="p">(</span><span class="n">dy</span><span class="o">=</span><span class="bp">self</span><span class="p">.</span><span class="n">grad</span><span class="p">)</span>
    <span class="k">def</span> <span class="nf">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="k">return</span> <span class="sa">f</span><span class="s">'Variable(id:</span><span class="si">{</span><span class="bp">self</span><span class="p">.</span><span class="nb">id</span><span class="si">}</span><span class="s">, data:</span><span class="si">{</span><span class="bp">self</span><span class="p">.</span><span class="n">data</span><span class="si">}</span><span class="s">, grad:</span><span class="si">{</span><span class="bp">self</span><span class="p">.</span><span class="n">grad</span><span class="si">}</span><span class="s">, prev:</span><span class="si">{</span><span class="nb">list</span><span class="p">(</span><span class="nb">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">a</span><span class="si">:</span><span class="n">a</span><span class="p">.</span><span class="nb">id</span><span class="p">,</span><span class="bp">self</span><span class="p">.</span><span class="n">prev</span><span class="p">))</span><span class="si">}</span><span class="s">, is_leaf:</span><span class="si">{</span><span class="bp">self</span><span class="p">.</span><span class="n">is_leaf</span><span class="si">}</span><span class="se">\n</span><span class="s">'</span>
</code></pre></div></div>

<h2 id="implementing-operations">Implementing Operations</h2>
<p>Each operation creates the <code class="language-plaintext highlighter-rouge">backward_fun</code> of the new <code class="language-plaintext highlighter-rouge">Variable</code> as a closure bound the the the parents. One can implement this part with generic functions which take the parents each time as parameters. This is possible and might lead to a more efficient run-time performance. However, this is not our primary concern here, so we go with the closures.</p>

<p><code class="language-plaintext highlighter-rouge">backward_fun</code> of the <code class="language-plaintext highlighter-rouge">dot</code> is simple, just the dot product of the <code class="language-plaintext highlighter-rouge">dy</code> with the other <code class="language-plaintext highlighter-rouge">Variable</code>’s data.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">dot</span><span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="n">b</span><span class="p">):</span>
    <span class="k">if</span> <span class="ow">not</span> <span class="p">(</span><span class="nb">isinstance</span><span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="n">Variable</span><span class="p">)</span> <span class="ow">and</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">b</span><span class="p">,</span><span class="n">Variable</span><span class="p">)):</span>
            <span class="k">raise</span> <span class="nb">ValueError</span><span class="p">(</span><span class="s">'a,b needs to be a Variable instance'</span><span class="p">)</span>
    <span class="k">def</span> <span class="nf">b_fun</span><span class="p">(</span><span class="n">dy</span><span class="p">):</span>
        <span class="k">if</span> <span class="n">np</span><span class="p">.</span><span class="n">isscalar</span><span class="p">(</span><span class="n">dy</span><span class="p">):</span>
            <span class="n">dy</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">ones</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="o">*</span><span class="n">dy</span>
        <span class="n">a</span><span class="p">.</span><span class="n">grad</span> <span class="o">+=</span> <span class="n">np</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">dy</span><span class="p">,</span><span class="n">b</span><span class="p">.</span><span class="n">data</span><span class="p">.</span><span class="n">T</span><span class="p">)</span>
        <span class="n">b</span><span class="p">.</span><span class="n">grad</span> <span class="o">+=</span> <span class="n">np</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">a</span><span class="p">.</span><span class="n">data</span><span class="p">.</span><span class="n">T</span><span class="p">,</span><span class="n">dy</span><span class="p">)</span>
    <span class="n">res</span> <span class="o">=</span> <span class="n">Variable</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">dot</span><span class="p">(</span><span class="n">a</span><span class="p">.</span><span class="n">data</span><span class="p">,</span><span class="n">b</span><span class="p">.</span><span class="n">data</span><span class="p">),</span><span class="n">is_leaf</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span><span class="n">backward_fun</span><span class="o">=</span><span class="n">b_fun</span><span class="p">)</span>
    <span class="n">res</span><span class="p">.</span><span class="n">prev</span><span class="p">.</span><span class="n">extend</span><span class="p">([</span><span class="n">a</span><span class="p">,</span><span class="n">b</span><span class="p">])</span>
    <span class="k">return</span> <span class="n">res</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">backward_fun</code> of the <code class="language-plaintext highlighter-rouge">relu</code> is just the masking.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">relu</span><span class="p">(</span><span class="n">a</span><span class="p">):</span>
    <span class="k">if</span> <span class="ow">not</span> <span class="p">(</span><span class="nb">isinstance</span><span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="n">Variable</span><span class="p">)):</span>
        <span class="k">raise</span> <span class="nb">ValueError</span><span class="p">(</span><span class="s">'a needs to be a Variable'</span><span class="p">)</span>
    <span class="k">def</span> <span class="nf">b_fun</span><span class="p">(</span><span class="n">dy</span><span class="o">=</span><span class="mi">1</span><span class="p">):</span>
        <span class="n">a</span><span class="p">.</span><span class="n">grad</span><span class="p">[</span><span class="n">a</span><span class="p">.</span><span class="n">data</span><span class="o">&gt;</span><span class="mi">0</span><span class="p">]</span> <span class="o">+=</span> <span class="n">dy</span><span class="p">[</span><span class="n">a</span><span class="p">.</span><span class="n">data</span><span class="o">&gt;</span><span class="mi">0</span><span class="p">]</span>

    <span class="n">res</span> <span class="o">=</span> <span class="n">Variable</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">maximum</span><span class="p">(</span><span class="n">a</span><span class="p">.</span><span class="n">data</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span><span class="n">is_leaf</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span><span class="n">backward_fun</span><span class="o">=</span><span class="n">b_fun</span><span class="p">)</span>
    <span class="n">res</span><span class="p">.</span><span class="n">prev</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">a</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">res</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">sumel</code> is just a broadcast when we look at the backward pass.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">sumel</span><span class="p">(</span><span class="n">a</span><span class="p">):</span>
    <span class="k">if</span> <span class="ow">not</span> <span class="p">(</span><span class="nb">isinstance</span><span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="n">Variable</span><span class="p">)):</span>
        <span class="k">raise</span> <span class="nb">ValueError</span><span class="p">(</span><span class="s">'a needs to be a Variable'</span><span class="p">)</span>
    <span class="k">def</span> <span class="nf">b_fun</span><span class="p">(</span><span class="n">dy</span><span class="o">=</span><span class="mi">1</span><span class="p">):</span>
        <span class="n">a</span><span class="p">.</span><span class="n">grad</span> <span class="o">+=</span> <span class="n">np</span><span class="p">.</span><span class="n">ones</span><span class="p">(</span><span class="n">a</span><span class="p">.</span><span class="n">data</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span><span class="o">*</span><span class="n">dy</span>

    <span class="n">res</span> <span class="o">=</span> <span class="n">Variable</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nb">sum</span><span class="p">(</span><span class="n">a</span><span class="p">.</span><span class="n">data</span><span class="p">),</span><span class="n">is_leaf</span><span class="o">=</span><span class="bp">False</span><span class="p">,</span><span class="n">backward_fun</span><span class="o">=</span><span class="n">b_fun</span><span class="p">)</span>
    <span class="n">res</span><span class="p">.</span><span class="n">prev</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">a</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">res</span>

</code></pre></div></div>

<h2 id="implementing-the-backward_engine">Implementing the backward_engine</h2>
<p>What we need to do is to call <code class="language-plaintext highlighter-rouge">.backward()</code> on each variable that is in our computational graph. We have the whole graph for every <code class="language-plaintext highlighter-rouge">Variable</code> since each <code class="language-plaintext highlighter-rouge">Variable</code> points its parents. The trick here is the call the <code class="language-plaintext highlighter-rouge">.backward()</code> in the right order since we need the <code class="language-plaintext highlighter-rouge">.grad</code> of the Variable to be fully accumulated before its <code class="language-plaintext highlighter-rouge">.backward()</code> call. To ensure this we do a topological sort and call the <code class="language-plaintext highlighter-rouge">.backward()</code> accordingly.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">backward_graph</span><span class="p">(</span><span class="n">var</span><span class="p">):</span>
    <span class="k">if</span> <span class="ow">not</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">var</span><span class="p">,</span><span class="n">Variable</span><span class="p">):</span>
        <span class="k">raise</span> <span class="nb">ValueError</span><span class="p">(</span><span class="s">'var needs to be a Variable instance'</span><span class="p">)</span>
    <span class="n">tsorted</span> <span class="o">=</span> <span class="n">__top_sort</span><span class="p">(</span><span class="n">var</span><span class="p">)</span>

    <span class="n">var</span><span class="p">.</span><span class="n">grad</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">ones</span><span class="p">(</span><span class="n">var</span><span class="p">.</span><span class="n">data</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">var</span> <span class="ow">in</span> <span class="nb">reversed</span><span class="p">(</span><span class="n">tsorted</span><span class="p">):</span>
        <span class="n">var</span><span class="p">.</span><span class="n">backward</span><span class="p">()</span>
</code></pre></div></div>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">__top_sort</span><span class="p">(</span><span class="n">var</span><span class="p">):</span>
    <span class="n">vars_seen</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span>
    <span class="n">top_sort</span> <span class="o">=</span> <span class="p">[]</span>
    <span class="k">def</span> <span class="nf">top_sort_helper</span><span class="p">(</span><span class="n">vr</span><span class="p">):</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">vr</span> <span class="ow">in</span> <span class="n">vars_seen</span><span class="p">)</span> <span class="ow">or</span> <span class="n">vr</span><span class="p">.</span><span class="n">is_leaf</span><span class="p">:</span>
            <span class="k">pass</span>
        <span class="k">else</span><span class="p">:</span>
            <span class="n">vars_seen</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">vr</span><span class="p">)</span>
            <span class="k">for</span> <span class="n">pvar</span> <span class="ow">in</span> <span class="n">vr</span><span class="p">.</span><span class="n">prev</span><span class="p">:</span>
                <span class="n">top_sort_helper</span><span class="p">(</span><span class="n">pvar</span><span class="p">)</span>
            <span class="n">top_sort</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">vr</span><span class="p">)</span>
    <span class="n">top_sort_helper</span><span class="p">(</span><span class="n">var</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">top_sort</span>
</code></pre></div></div>

<p>Note that we can make the <code class="language-plaintext highlighter-rouge">.backward()</code> calls inside the <code class="language-plaintext highlighter-rouge">__top_sort</code> function and this might be slightly efficient. We, again, pick the easy-to-understand-way of implementing things.</p>

<h2 id="enabling-higher-order-gradients">Enabling higher order gradients</h2>
<p>Note that in the backward pass we don’t return <code class="language-plaintext highlighter-rouge">Variable</code>. It is very straight forward to enable higher order gradients by returning Variables at the backward_pass. To do that we need to use the operations we defined above inside the every <code class="language-plaintext highlighter-rouge">backward_fun</code>.</p>

<p>For the rest of the code and some test, please refer to https://github.com/evcu/numpy_autograd</p>

<p>[1] Automatic differentiation in machine learning: a survey https://arxiv.org/abs/1502.05767</p>]]></content><author><name>Utku Evci</name><email>ue[224+1]@nyu.edu</email></author><category term="ml" /><category term="python" /><category term="ml" /><summary type="html"><![CDATA[Autograd is not a magic. It is a very simple idea implemented carefully]]></summary></entry><entry><title type="html">Passing variables by value in Python</title><link href="https://evcu.github.io/notes/python_pass_by_value/" rel="alternate" type="text/html" title="Passing variables by value in Python" /><published>2018-02-21T00:00:00-05:00</published><updated>2018-02-21T00:00:00-05:00</updated><id>https://evcu.github.io/notes/python_pass_by_value</id><content type="html" xml:base="https://evcu.github.io/notes/python_pass_by_value/"><![CDATA[<h3 id="passing-variables-by-value-in-python">Passing variables by value in Python</h3>

<p>Let say you want to create a function that uses a variable from it’s outer scope.
By default python creates a closure for each of these functions created and the variables
are evaluated during the call to the function. And you need to be careful about these
when you are coding!</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">f_list</span><span class="o">=</span><span class="p">[]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">):</span>
    <span class="n">f_list</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="k">lambda</span> <span class="n">a</span><span class="p">:</span><span class="k">print</span><span class="p">(</span><span class="n">a</span><span class="o">*</span><span class="n">i</span><span class="p">,</span><span class="n">end</span><span class="o">=</span><span class="s">','</span><span class="p">))</span>

<span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">f_list</span><span class="p">:</span> <span class="n">f</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span> <span class="c1">#prints: 20,20,20,20,20,
</span></code></pre></div></div>

<p>In this example we are intended to create a list of functions that prints various multiple’s
of input values. However since the variable <code class="language-plaintext highlighter-rouge">i</code> is passed by reference, it is not bounded until
we call the functions in the second loop. Since the <code class="language-plaintext highlighter-rouge">i</code> is set to be for at the end of the first
loop all of the functions multiply the input with 4.</p>

<p>So how can we evaluate the value of i during the definition and bound the variable <code class="language-plaintext highlighter-rouge">i</code> to
its current value. One way to do it is to use named arguments with default values! A</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">f_list2</span><span class="o">=</span><span class="p">[]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">5</span><span class="p">):</span>
    <span class="n">f_list2</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="k">lambda</span> <span class="n">a</span><span class="p">,</span><span class="n">i</span><span class="o">=</span><span class="n">i</span><span class="p">:</span> <span class="k">print</span><span class="p">(</span><span class="n">a</span><span class="o">*</span><span class="n">i</span><span class="p">,</span><span class="n">end</span><span class="o">=</span><span class="s">','</span><span class="p">))</span>

<span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">f_list2</span><span class="p">:</span> <span class="n">f</span><span class="p">(</span><span class="mi">5</span><span class="p">)</span>  <span class="c1">#prints: 0,5,10,15,20,
</span>
</code></pre></div></div>

<p>And it works like
a charm. Please use it on your <code class="language-plaintext highlighter-rouge">lambda</code> or regular <code class="language-plaintext highlighter-rouge">def</code> functions ;)</p>]]></content><author><name>Utku Evci</name><email>ue[224+1]@nyu.edu</email></author><category term="notes" /><category term="python" /><summary type="html"><![CDATA[Python closures are great, but what if you wanna pass the variable by value]]></summary></entry><entry><title type="html">Reflection 2017</title><link href="https://evcu.github.io/personal/reflection2017/" rel="alternate" type="text/html" title="Reflection 2017" /><published>2017-12-31T00:00:00-05:00</published><updated>2017-12-31T00:00:00-05:00</updated><id>https://evcu.github.io/personal/reflection2017</id><content type="html" xml:base="https://evcu.github.io/personal/reflection2017/"><![CDATA[<p>Lets do it! This time, on time, 31.12.2017 around 19:40.</p>

<p>Last years reflection <a href="https://evcu.github.io/personal/reflection2016/">here</a></p>

<p>Okay how is life going Utku? I am doing fine and honestly I am afraid to say no. Maybe you are asking the wrong question.</p>

<p>Do you feel strong Utku? I am afraid, no. And I hate this answer. Maybe I shouldn’t.</p>

<p>Can you summarize a whole year in one day objectively? No.</p>

<p>So what are we doing this time? A remembering exercise and maybe some thoughts that come up as a result during this process. Thats what we do today. Lets see what we did last year.</p>
<h3 id="highlights">Highlights</h3>
<ul>
  <li><strong>Travel</strong>: <a href="https://www.tumblr.com/blog/se2sf">seattle2sanfransisco</a>, <a href="https://www.youtube.com/watch?v=tElf982F2FA&amp;list=PLQ-hmWDr8eC8tOr8c-gihLqfgjJSr6zek">summer2017 videos</a>
    <ul>
      <li>Atlanta2Miami</li>
      <li>4 days in Marmaris/Datca.</li>
      <li>Vancouver/Victorira: Sea plane</li>
      <li>Roadtrip 2 San Fransisco: eclipse, McMenamins Hotel, Redwoods, Half Dome…</li>
      <li>Toronto/upstateNY</li>
      <li>LasVegas+Some Canyons</li>
    </ul>
  </li>
  <li><strong>Events</strong>:
    <ul>
      <li>Burning Man+WA-Regional</li>
      <li>Gorge Amphitheatre</li>
      <li>Gogol Bordello in December</li>
      <li>Caught football in CenturyLink Field</li>
    </ul>
  </li>
  <li><strong>Personal</strong>
    <ul>
      <li>Amazon AWS internship.</li>
      <li>Completed the Niagara Marathon.</li>
      <li>Got my first paycheck.</li>
      <li>Spent the summer in Washington</li>
      <li>Did 5 road-trips with a car in US(2 of them alone) and 2 short trips.</li>
      <li>Moved to Manhattan!</li>
    </ul>
  </li>
  <li><strong>Learning</strong>
    <ul>
      <li>Learned pytorch, openMPI/MP, CUDA, SPARK</li>
      <li>Learned how to write proper libraries with proper tests.</li>
      <li>Learned crypto-currencies.</li>
      <li>Learned sewing and some LED/EL-wire stuff.</li>
      <li>Started learning tennis.</li>
      <li>Improved at bouldering.</li>
      <li>Improved in English.</li>
    </ul>
  </li>
  <li><strong>Academic &amp; Career</strong>
    <ul>
      <li>Submitted paper with Levent to the NIPS.</li>
      <li>Got full time offer from Amazon.</li>
      <li>Started earning money as a grader.</li>
    </ul>
  </li>
  <li><strong>External</strong>
    <ul>
      <li>Broke-up with J.</li>
    </ul>
  </li>
</ul>

<h3 id="lowlights">Lowlights</h3>
<ul>
  <li>
    <p><strong>Downtimes</strong>: I am not sure whether this was the case all the time. Maybe it wasn’t, maybe it was. Maybe I should write in Turkish, maybe not. But looking back to my diary I can see many times I complained about not having a goal and feeling lonely in my mind. I am not sure whether this is something that would get any better by thinking on it and reflecting. I am even not sure whether this is a waste of time. Is there a thing called waste of time when you don’t have a big goal? Do you have a big goal? What is that Utku? These are the questions keep coming to me.</p>
  </li>
  <li>
    <p><strong>Career in ML</strong>: Didn’t flourish in ML and spent a semester without research. Now I feel like I missed a train.</p>
  </li>
  <li>
    <p><strong>Friendships</strong>: I am not satisfied with friendships in US in general, I miss my friends back home. I think there is a trick here. Sometimes I feel like I don’t like people and I think this effects my relationships. I am happy by myself. Are you?</p>
  </li>
</ul>

<h3 id="general-comments">General Comments</h3>
<ul>
  <li>I started this piece of text yesterday and I am trying to finish it today on the first day of the year. There are some things I want to point out. I need to assemble the parts. I have X amount of energy now and the amount can be changed in either direction. So what do you do? You can try to fix certain stuff or try to improve your capacity. Play your cards and what? What is the end that is the first thing you should ask.</li>
</ul>

<p>I wrote the followings in my first trial, trying to reduce and reason</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Creating Moments/Plan! --&gt;Need to dream first! --&gt; You need to look out for things to do and think of new things. Find new channels of information.
...constant happiness. --&gt; Optimizing the way you live --&gt; Be proactive and get excited about the change, be like a feather.--&gt;Think of
...find inspiring people and things. --&gt; Try-out new things and learn &amp; Be sleepy.
...love people. --&gt; Find people to love &amp; touch them. --&gt; Increase your interest in others.
...know interesting people --&gt; Try-out new things and learn and be happy about yourself
...be stronger(mentally and physically). I want to be a fighter --&gt; Attack things --&gt; Find things to attack.
...minimize the stress in my life. --&gt; Optimizing the way you live
...earn money--&gt;aim goals in the career.
...be more connected to the community --&gt; find local gatherings&amp;organizations.
...know yourself more. What makes you happy what


...to think and remember
...not stuck into the old plans. I can forget it and re-discover.
</code></pre></div></div>

<p>I feel like there some main topics.</p>

<ul>
  <li>Career related Development
    <ul>
      <li>To back-log. Update goals.</li>
      <li>Publish</li>
      <li>Find new goals.</li>
      <li>Meet with new people, enlarge the network</li>
    </ul>
  </li>
  <li>Personal Development
    <ul>
      <li>Find do interesting stuff</li>
      <li>Find interesting people, love them</li>
      <li>Learn new gathering</li>
    </ul>
  </li>
  <li>Every-Day Optimization
    <ul>
      <li>How to have fun</li>
      <li>How to eat healty</li>
      <li>How to exercise</li>
      <li>How to generate time</li>
    </ul>
  </li>
  <li>Long/Mid-Term dreams
    <ul>
      <li>thinking about life and decide direction</li>
      <li>Generate dreams and plan experiences</li>
    </ul>
  </li>
</ul>

<p>I feel like setting up a three month plan, but I don’t feel like it now. So I will postpone. For now.
But this is the plan</p>

<ol>
  <li>Setup a repo for various lists(like things you wanna do, dream list).</li>
  <li>Setup reminders for the following focuses. Focus is a session where you read the last focus report and backlog and generate a new one. Before starting go over the definition(what to focus, what to do) of each focus(first create them)
  a. Focus on your career related Development: every 1month
  b. Focus on your Personal Development: every 1month
  c. Focus on your Every-Day Optimization: every 2 weeks
  d. Focus on your Long/Mid-Term dreams: every 2 weeks</li>
  <li>After three months read your reports and evaluate and decide to stop or continue</li>
</ol>]]></content><author><name>Utku Evci</name><email>ue[224+1]@nyu.edu</email></author><category term="personal" /><category term="personal" /><category term="reflection" /><summary type="html"><![CDATA[A delayed reflection about the past one and some words for the next]]></summary></entry></feed>