August’17 Golang Bangalore Meetup

The August Golang Bangalore Meetup was conducted on Saturday, August 26, 2017 at Red Hat India Pvt. Ltd. Since the event took place around the holidays, there were less number of people who turned up for the event.

The meetup started at 10:30 with the first talk by Nurali Virani who works at SAP Labs, Bengaluru. He talked about “Understanding Slice & Map in Golang”. Nurali’s talk was a beginner friendly talk. He explained the concepts in very detail and by live coding. He addressed each and every question raised by the participants. The code written  by Nurali during his demo can be found here.

The next talk was done remotely by Steve Manuel. Steve (@nilslice) lives in Boulder, Colorado. He is the co-founder of Boss Sauce Creative. Steve talked about his open source project Ponzu. Ponzu is a headless CMS with automatic JSON API, featuring auto HTTPS, HTTP/2 Server Push, and flexible server framework written in Go. The slides related to his talk can be found here. Other related resources: Github, Docs, Addons. To know more about ponzu, join #ponzu on gophers.slack.com. You can receive invitation to join Slack from here: https://invite.slack.golangbridge.org

Fortunately, Steve’s talk is recorded. The recording can be found here.

I thank Udayakumar Chandrashekhar and Red Hat India Pvt. Ltd. for helping us to organize the August Golang Bangalore Meetup by providing venue and food.

Advertisements
August’17 Golang Bangalore Meetup

June ’17 Golang Bangalore Meetup

The June Golang Bangalore Meetup was conducted on Saturday, June 17th, 2017. There were around 35-40 people who attended the meetup.

DCf3kQJVwAA8VSC.jpg:large

The meetup started at 10:15 with the first talk by Nurali Virani who works at SAP Labs, Bengaluru. He talked about “Understanding Type System In Go”.

DCgErtoU0AAVDds.jpg:large

Saifi gave an awesome talk about “Working with C code and plugins in Go”. The slides related to his talk can be found here

DCgNU5GU0AAFIhH.jpg:large

Umasankar Mukkara is the Co-Founder and CEO at CloudByte Inc. He talked about OpenEBS and their experience with Golang. The slides of his talk could be found here. OpenEBS is an open source project written in Golang. You can find the source code of OpenEBS here.

Satyam Zode who is also a fellow Golang programmer at OpenEBS presented a talk about “Package oriented design in Go”. The slides related to his talk could be found here

IMG_20170617_115317IMG_20170617_115325

I thank Uma, OpenEBS and Nexus Ventures Partners for helping us to organize the June Golang Bangalore Meetup by providing venue and food. There were also a few goodies for the participants by OpenEBS. Fortuanately, this event was recorded and we will be sharing the recorded videos soon when they will be ready.

 

 

June ’17 Golang Bangalore Meetup

Working on Jira and Bugzilla issue in Project Almighty

I joined Red Hat on 15/06/2016.

I was assigned two issues on project Almighty to work on:

  1. Create simple example of fetching Issues from Jira in Go
  2. Create simple example of fetching Issues from Bugzilla in Go

I started with the Jira Issue. This was my first issue in golang. While solving this issue, i learnt the usage of “go get” command in golang. It downloads + installs an upstream package. For example, this is an upstream package which I used in the Jira Issue –> github.com/andygrunwald/go-jira

So, I installed+downloaded the package using following command:

go get github.com/andygrunwald/go-jira

The package got installed:

Screenshot from 2016-06-27 12-39-46

I learnt how to take input from command-line flags using “flag” package. To do this:

  • import flag package using:

import "flag"

  • Declare the variable in which you want to store the input

var username string
flag.StringVar(&username, "uname", "", "Username")

The first argument &uname is a pointer which stores the input value.
The second argument is the flag name which you use on the command line e.g. “-username”.
The third argument is the default value of the flag input.
The fourth argument is the description of the flag.

  • Parse the flag:

flag.Parse()

Next, I learnt to work with “net/http” package and how to parse http response in Go. Although I didnt use the package for this issue but I did experiment with it.

I used “reflect” package to find out the data type of http response. The two commands i used were:

reflect.TypeOf(result)
reflect.TypeOf(result).KindOf()

I found that http response have slice data type and each slice element had struct data type.

To access the value from struct data type, I had to find out its property name. For that, I used “reflect” package:
reflect.Indirect(result).FieldByName(field)

The next issue was bugzilla issue, The golang packages for bugzilla are listed here.

I used this package to get the results –> github.com/kolo/xmlrpc

XmlRpc is used to make remote procedure calls over HTTP.

The endpoint for xml-rpc interface is xmlrpc.cgi script in the bugzilla installation. So, for Red Hat Bugzilla, the endpoint is https://bugzilla.redhat.com/xmlrpc.cgi

You can find my code here.

Working on Jira and Bugzilla issue in Project Almighty

Machine Learning

Learning

To gain knowledge or understanding or skill through:

  • study
  • instruction
  • experience

 

Machine Learning

The field of study that gives computers the ability to learn without the need of explicitly programming.

The goal is to device programs that learn and improve performance with experience without human intervention.

 

Training Data

  1. Set of examples (input -> output) for learning
  2. Used to build model

 

Test data

  1. Used to test:
    • how good your model can predict
    • estimate model properties
  2. It is always outside training data set but follows some probability distribution as training data

 

Feature

  1. also called predictor
  2. It is a meaningful attribute
  3. Internal representation of data
  4. quantity describing an instance
  5. property of an instance

 

Tuple

  1. A Record in data base
Screenshot from 2016-05-02 23-34-01
Features are columns and Tuples are rows

 

If we increase the number of records, attributes in a data set, then Machine Learning problem also becomes a Big Data problem.

 

Supervised learning

  1. It uses training data set consisting of input -> correct output to train the model
  2. Example:
    • Page Ranking Algorithm
    • Next word recommendation in Instant Messaging Application/ Whatsapp/ SMS

 

Unsupervised learning

  1. No training data set exists
  2. Most difficult algorithms are unsupervised learning because there is no “fixed” objective.
  3. used in Explaratory Data Analysis (EDA)
  4. Example:
    • used in recommendation systems to determine users who are similar to me from existing database

 

tiff infomation
Machine Learning types

 

 Classification  Clustering
We have a set of pre-defined classes and we want to know which class a new object belongs to. Group a set of objects and find whether there is some relationship between objects.
It is predictive modelling. We give          pre-defined groups and predict group of new data. It is descriptive modelling. We try to find groups which occur naturally in data .

 

Classification

There are 6 items categorised in 2 classes:

tiff infomation
Example of Classification

 

Each category has a label e.g. Eatables and Non-Eatables. If we have to predict the class of a new item “strawberry”, then it will be assigned a label “Eatable”

 

Clustering

There are 6 items categorised in 2 groups:

tiff infomation
Example of Clustering

 

Each category is unnamed i.e. there is no label attached to the group. If we have to predict the group of a new item “strawberry” then it will be in the first group.

 

Accuracy

  1. How often is the prediction correct?
  2. Accuracy is not reliable metric for real performance of model because it will yield misleading results if training data set is unbalanced (i.e.  number of samples in different classes vary greatly).
  3. Example:
    1. Let number of cats be 95 and number of dogs be 5
    2. Classifier can easily bias into classifying all samples as cats
    3. Overall acuuracy = 95%
    4. BUT 100% recognition rate for cats and 0% recognition rate for dogs
  4. One of the ways to improve accuracy is to provide more balanced data.

 

This is one of the interesting things explained by Satish Patil in Pune Python Meetup that:

There is no right or wrong model. There is no best or worst model. There is ONLY useful and non-useful model. 

Nobody knows how much percentage of accuracy is good. How much accuracy is needed depends on Business Context.

Consider a company which wants to launch a new product and they want the probability of success of the product using Machine Learning. So, it is the company which DECIDES that if they get probability below 60%, then they will not launch the product. So, this is not something that the developer decides. This totally depends on the business context.

 

Market Basket Analysis

  1. Also called affinity analysis
  2. Association Rule:
    • discovering interesting relation/connection/association between specific objects
  3. Sometimes, certain products are typically purchased together like:
    • beer and chips
    • beer and diapers
    • bread and eggs
    • shampoo and conditioner
  4. So, market basket analysis tells a retailer that promotion involving just one of the items from the set would likely drive sales of the other
  5. This technique is used by retailers to:
    • improve product placement
    • marketing
    • new product development
    • making discount plans

 

Titanic Data Set

The titanic data set was used in the machine learning talk in Pune Python Meetup. It can be downloaded here.

There are some features in the data set which can be ignored as they are not important like:

  • Passenger ID
  • Name
  • Ticket Number
  • Cabin

and there are some important features which help in classifying like:

  • Survived
  • Gender

 

Impurity Measure

  1. Measures how well are the classes separated
  2. Should be 0 when all data belong to one class

 

Entropy

  1. Entropy can be a measure of quality of model
  2. It is a measure of how distributed are the probabilities.
  3. The more equal is the share for the probability values in all the classes, the higher is the entropy.  The more skewed is the share among the classes, lesser is the entropy.
  4. The goal in machine learning is to get a very low entropy in order to make the most accurate decisions and classifications

 

Decision Tree

  1. A way of graphically representing an sequential decision process
  2. Non-leaf nodes are labelled with attribute/ question
  3. Leaf nodes are labelled with class
tiff infomation
decision tree based on titanic data set

 

Pruning

  1. Data can contain noise:
    • instance can contain error
    • wrong classification
    • wrong attribute value
  2. If a particular feature is not used by a tuple or if the feature is not influencing, then it is removed.

 

Data Preprocessing

  1. Converting data into interval form
  2. Machine learning algorithms learn from data so its important to feed it the right data
  3. Data preprocessing basically involves:
    • correcting mistakes
    • handle missing values
    • handle outliers
    • normalize values
    • nominal values

 

Missing Value

  1. The value of an attribute which is not known or does not exist
  2. Example:
    • value was not measured
    • instrument malfunction
    • attribute does not apply
  3. If a column contains “Not Available”, then it is NOT considered as a missing value.

 

Outliers

  1. samples which are far away from other samples
  2. They can be mistake/ noise or represent a special behaviour
  3. Outliers are generally removed

 

Questions that were asked in meetup

  1. Can data be extended to multiple dimension?
  2. Can distance be other than Euclidian?
    • Yes, Manhattan distance
  3. Are there online courses that teach ML intro?
    • Yes
  4. What is “k” in k-means?
    • k is no. of clusters
  5. Can we use ML for trading?
    • Yes
  6. Any daily life clustering example
  7. Any software product based on unsupervised learning?
    • Google Maps
    • Matrimony/ Dating websites
    • Red Coupon (real estate)
    • Amazon recommendation
    • Netflix
  8. Order in which features is given, is that important?
    • No
  9. Why do we say that one model is better than the other?
  10. What if accuracy is not the concern?
    • Accuracy is one way of looking at prediction
  11. Do you think that if model changes, something in feature has changed?
  12. We have tools like WEKA, so why would anyone prefer Python or R?
    • depends on the language available or language the company uses
  13. How do we know that a particular feature is important or not?
  14. What if some features are more influential than others? How will the decision tree be affected?
  15. How to handle outliers in a decision tree?
  16. Will the algorithm figure out the relationship between input and output?
    • This is possible through Regression

 

Machine Learning

Event Report: April Pune Python Meetup

April Pune Python Meetup (@PythonPune) was conducted on April 30, 2016 at Redhat, Pune. Around 70 people registered for the meetup but the turnout was around 72-73. A few people registered on the spot.

Python Pune Meetups are organised by Chandan Kumar (@ciypro) who is a fellow RedHat employee, a python programmer and FOSS enthusiast who has contributed to many upstream projects.

The meetup started around 10:45 with the introduction where everybody introduced themselves. Almost everybody knew python. There were 1-2 people who did not know python. There were a few people who were experience in machine learning and some who were completely new to Machine Learning. I had a course on machine learning in my college where i learnt the theory and did some practical assignments in R language. The crowd was diverse consisting of students, data scientists, professors and people of various age groups 18 – 70.

This speakers of this meetup were Satish Patil (@DataGeekSatish) and Sudarshan Gadhave (@sudarshan1989) who took a session on Introduction to Machine Learning. 

4
Satish Patil in Pune Python Meetup

 

5
Sudarshan Gadhave in Pune Python Meetup

Satish Patil is the Founder and Chief Data Scientist of Lemoxo Technologies, Pune where he advises companies large and small on their data strategy. He has 10+ years of research experience in the field of drug discovery and development. He told a few real life machine learning examples from his field in the meetup!

Satish is passionate about applying technology, artificial intelligence, design thinking and cognitive science to better understand, predict and improve business functions. He has a great interest in Machine Learning, Artificial Intelligence, Data Visualisation, Big Data.

Satish covered the following topics:

  • What is Machine Learning
  • The Black Box of Machine Learning
  • features
  • training and test data set
  • classification
  • clustering
  • pure and impure states
  • entropy
  • decision tree
  • supervised and unsupervised learning
  • market basket analysis
  • data pre-processing
  • Titanic data set
  • K means algorithm

Although Machine Learning is a vast concept and it definetly requires more sessions to grasp, but Satish made a remarkable effort in making us understand all the above topics in layman terms.

There are a lot of books, courses, material available online for Machine Learning, so why this meetup? Well, the best part about this meetup was the way Satish explained the BUSINESS CONTEXT of MACHINE LEARNINGThis was something new for me to learn. Getting to know the real life examples from the entrepreneur-cum-data scientist was really interesting.

1.jpg
The Machine Learning Workshop in Pune Python Meetup

The details of his talk will be in my next blog.

Chandan Kumar talked about Fedora Labs. The Fedora science spin comes pre-installed with essential tools for scientific and numerical work like IDE, tools and libraries for programming in Python, C, C++, Java and R. It basically eliminates the need to download a bunch a scientific packages which you need.

If you need any help regarding the spin, you can get help from #fedora-science channel on Freenode on IRC.

As Chandan Kumar ALWAYS encourages us to contribute to open source, he introduced us to WHAT CAN I DO FOR FEDORA?. Pune Python meetups and Devsprint are a great platform to seek for help if you want to contribute to opensource.

3
Chandan Kumar in Pune Python Meetup

 

Thanks to Satish Patil and Sudarshan Gadhave for conducting an awesome workshop! We hope to see more such workshops by you in the meetups.

Thanks to RedHat for the food, beverages and venue.

Thanks to Chandan Kumar, Pravin Kumar (@kumar_pravin), Amol Kahat, Sudhir Verma for organising such interesting meetups where we always learn something new 🙂

 

 

Event Report: April Pune Python Meetup

Asynchronous programming

I spent today learning about asynchronous programming and these are my notes related to it.

So, there are two kinds of systems -> synchronous and asynchronous.

In a synchronous system, you wait for a task to finish COMPLETELY before you move on to some other task

In an asynchronous system, you move on to some other task before it finishes. This allows more parallelism.

Event loop

  1. If we want to do something asynchronously in programming language, we use event loop
  2. Event loop can do following things:
    • register tasks to be executed
    • execute the tasks
    • delay execution of tasks
    • cancel tasks
  3. Every event is attached to event listener or else the event gets lost
  4. The main purpose of an event loop is:
    • run first function
    • while that function waits for IO, event loop pauses it and runs another function
    • when the first function returns result then it resumes it

 

Generators

  1. used to create iterators
  2. generators return multiple items but NOT as a list. They return items one by one

 

Difference between a normal callback function and a generator:

Normal callback function approach Generator approach
 After collecting ALL the results, it displays them altogether  It displays result as it finds them.

Advantages:

  • space efficient (no need to store all data at once)
  • time efficient (iteration may finish before all items are needed)
  • user-friendly (allows parallelism)

 

Yield

A function becomes a generator when it uses “yield”

 

Difference between a normal return statement and yield statement:

 Return  Yield
In a function which uses  return statement,

  • local variables are destroyed
  • the scope is destroyed
  • if function is called again, fresh set of variables are created
In a function which uses a yield statement,

  • local variables are not destroyed
  • scope is preserved
  • we can resume the function where we left off

 

Coroutine 

  1. Coroutine is basically used to allow to execute several tasks at once through non-preemptive multitasking
  2.  It passes control to each subroutine, wait for it until it finishes off, we can re-enter the routine later and continue
  3. Coroutine can suspend itself. But once it actually exits/returns, then it cannot be resumed.
  4. There is no need to add a yield statement in a coroutine but the function called by a coroutine can have a yield statement

 

Future

This is the most interesting concept.

  1. Future is one way to write asynchronous code
  2. Future is result of work that has not been completed yet
  3. future() method does not return a result, but returns a future object. When the task completes, the result is returned eventually. Meanwhile, next code is executed.
  4. When do we know that state of future has changed?
    • when set_result() is called
  5. How to check that the task taken by future has been completed?
    • by using event loop -> it watches state of future object to indicate that its done
  6. Future is a way of performing many tasks in parallel, in efficient, non-blocking way
  7. There are two cases:
    • When computation of task does not complete -> future does not complete.
    • When computation of task completes and returns either a result or exception -> future completes
  8. The result of returned by future() can be:
    • value -> future completes successfully
    • exception -> future has failed with an exception
  9. Future has an important property.
    • It can be assigned only once
    • Once it has been given a value, it becomes immutable and can never be over-written

 

Task

  1. a subclass of future
  2. wraps a coroutine
  3. when coroutine is finished, it returns result then task is finished

 

 

Conclusion

  1. For asynchronous programming, we need event loop
  2. We register our tasks/ futures in the event loop
  3. The event loop schedules them and executes them
  4. Callbacks are used so that we are notified when tasks/ future return results
  5. Coroutines are wrapped in tasks/ futures
    • when yield is finding, coroutine is paused
    • when yield gets a value, coroutine continues
  6. If coroutine doesnot return a value but returns an exception, then task fails.

 

This is a simple program i tried with Python’s asyncio module:

Screenshot from 2016-04-27 17-41-52

 

Python asyncio module

 

@asyncio.coroutine:

basically defines a coroutine

 

loop = asyncio.get_event_loop()

creates an event loop

 

loop.run_forever()

runs  a loop until stop() method is called

Asynchronous programming

April Fedora Meetup, Pune

The Fedora meetup was conducted by Kushal Das on 23rd April 2016. Around 15 people attended the meetup at Kushal’s place.

In this meetup, Kushal introduced us to Unit Testing. Its a term often heard in Software development.

 

Testing

Testing is done to determine if there are errors in the code. It does not prove that the code is correct but it just checks that the conditions are handled correctly. Tests are as important as implementation itself.

 

Unit testing

Unit testing is done to evaluate a particular code component. These components can be function , class etc. It determines how well the component reacts to valid or invalid input.  Unit tests are written and executed by software developers.

 

unittest – an automated testing framework in Python

I installed the unittest module on my Fedora machine using:

pip install unittest

I used unittest module in my python program using:

import unittest

 

Basic Test Structure

There are two parts of a basic test:

  1. The code to prepare for test (test fixture)
  2. The code to test

 

Our first Test Case

This program tests cat command in Fedora. The cat command is used to read the contents of a file. We use the cat command in the following way:

Screenshot from 2016-04-25 14-47-30.png

This following program is the first test case. It prints:

  • the output of cat command
  • the error
  • the exit code

 

f2

I saved the script with the name “firetest.py”. The simply ran the script directly from the command line:

Screenshot from 2016-04-25 15-05-45

The above result shows:

  • The output of cat command.
  • The command ran successfully so error message is empty
  • exit code is “0”(successfully completed)

After that, there is a “.” before the dashed line above which indicates that the test passed.

Next, it shows the amount of time the tests took.

After that, the test outcome i.e. OK is shown.

Every test has 3 possible outcomes:

  • OK: the test passes
  • FAILED: the test doesnot pass  and raises “AssertionError” exception
  • ERROR: the test raises error other than “AssertionError”

Now, i did modification to the above program and changed the file name to something else i.e. gave a wrong file name and then ran the script. This is the result:

Screenshot from 2016-04-25 15-18-16

Here, since the file name is wrong, the cat command throws an error. The output is empty. The error message is displayed along with the exit code “1”(unsuccessful completion).

Kushal told us that the “b” before the output and error message indicates that it is in “bytes”.

To convert the output in string, we inserted the following line before print statement:

Screenshot from 2016-04-25 15-23-47

 

ASSERT

Assert statement is used for verifying expectations or some property of code being tested is true. Assert literally means “to state as fact”.

Assert statement stops the routine and sends the error message.

There are many assert statements like:

  • assertEqual
  • assertNotEqual
  • assertTrue
  • assertFalse
  • assertIn
  • assertNotIn
  • assertIs
  • assertIsNot

In the meetup, we tried “assertIn”. assertIn(a, b) checks if a is in b.

This following is the code modification that we made. Here, the assertIn checks if the word “fedora” is present in the output.

Screenshot from 2016-04-25 15-54-57.png

This is the result:

Screenshot from 2016-04-25 15-56-31

This means that the string “Fedora” is present in the output of cat command.

We modified the string “Fedora” to generate an error in this way:

Screenshot from 2016-04-25 15-58-45

After running the script, this is the result:

Screenshot from 2016-04-25 15-59-57.png

The test fails as the string “Fedra” is not present in the output.

Kushal, Chandan Kumar and Praveen Kumar gave all of us a simple task of testing.

My task was to:

  • change a file’s SELinux permissions to root (not through mode)
  • Write a test case to check if it gives an error when i try to access it through normal user sudo chmod 700
  • Check if the error message is present in the SELinux’s log file.

 

In this meetup i learnt about unit testing in Python; also I learnt about SELinux for the first time while solving the task.

Thanks to Kushal for providing the yummy snacks 🙂 and conducting the meetup.

April Fedora Meetup, Pune