Jose Sandoval Google
 Resume     Book     Software     Drawings     Home     Subscribe to RSS feed Search Web Search josesandoval.com

Zipf's Law and Software Engineering
February, 2012

All well designed software applications are alike; each badly designed software application is badly designed in its own way.

For some time now I've been looking at Zipf's Law and wondering if it applies to computer programs written in any modern computer language. In other words, is Zipf's Law relevant when analyzing computer code? And if it is relevant, what does it say about the structure and the correctness of software?

Wentian Li summarizes Zipf's Law as "the observation that frequency of occurrence of some event (P), as a function of the rank (i) when the rank is determined by the above frequency of occurrence, is a power-law function Pi ~ 1/ia with the exponent a close to unity (1)."

For the sake of argument, let P (a random variable) represented the frequency of occurrence of a keyword in a program listing.

On the surface, Zipf's Law is meaningless when talking about any program written in any contemporary programming language because every programming language has a limited number of keywords and some keywords are used more than others. For example, the keyword goto exists in most modern languages, but we shy away from its use, as we've been taught that gotos are evil--and they are, though they have their place. In all likelihood, this keyword has a low frequency of occurrence in most programs.

On the other hand, the keyword for is common in algorithms dealing with data, and likely to have a higher frequency of occurrence in source code. Therefore, I conclude, without empirical proof because it's an obvious finding, that any computer program written in any contemporary programming language has a power law distribution, i.e., some keywords are used more than others.

However interesting the frequency distribution of computer language keywords is, it has little practical value. Of much more interest to software engineering is the context and combination of all the keywords--entire stories can be told in computer programs. For instance, we create entities that don't exist except in computer memory at run time; we create logic nodes that will never be tested because it's impossible to test every logic branch; we create information flows in quantities that are humanly impossible to analyze with a glance; in sum, we create chaos and order at the same time. The second law of thermodynamics is at play everywhere: the more combination of keywords we add, the entropy of the state machine increases.

Because I'm arguing that what matters in software applications is the combination of keywords within the context of a solution and not their quantity used in a program, I need to explain what context means. This is not a trivial task because the context of an application is attached to the problem being solved and every problem to solve is different and must have a specific program to solve it. (Don't confuse reusable code here; even if a framework is used to solve a problem, the solution will be unique in its own way.)

Although a program could be syntactically correct, it doesn't mean that the algorithms implemented solve the problem at hand. What's more, a correct program can solve the wrong problem. Let's say we have the simple requirement of printing "Hello, World!" A syntactically correct solution in Java looks as follows:

public class SayHello {
public static void main(String[] args) {
System.out.println("Jose Sandoval!");
}
}
This solution is obviously wrong because it doesn't solve the original requirement. This means that the context of the solution within the problem being solved needs to be determined to ensure its quality. In other words, we need to verify that the output matches the original requirement.

You could argue that the scenario I've presented here doesn't happen in real life; however, you would be surprised as to how often it actually does happen.

In the past, I have coded a perfect solution to a non-existent problem, and the issue wasn't the requirement gathering phase. The issue was that once the application became an executable program, the problem I was originally solving wasn't a problem anymore--the requirements had evolved. It took another iteration to get the right solution.

This is a valuable lesson to learn, however; one that I wish every developer I come in contact with has learned the hard way (code the correct solution to a non-existing problem). There's no finger pointing in the process; what's interesting to me is to know what he or she does now to prevent the same mistake.

It's becoming clear that Zip's Law, as I'm using it here to count keywords, has nothing to say for my application above. Zip's Law can't even say too much about larger systems I've written if all I'm doing is grouping program statements. Interesting patterns, however, begin to emerge when you begin to look at an aggregate body of source files. The most important are reusability, coupling, and proper encapsulation. The length of classes and functions becomes important and it's easy to argue for short versus long functions--typically, a long function is problematic to debug and maintain, and almost impossible to extend.

So modern software engineering university courses preach the main three tenets of object oriented development as being the silver bullet for woes. Zip's Law seems to support this practice as exposed in the paper Understanding the Shape of Java Software. This research looks at large software projects to try to understand what makes a system successful, and the authors seem to have found commonalities on the things that we've come accept as good design principles of software systems.

We seem to take the three tenets of object oriented engineering at face value; however, there are solid theoretical reasons why they hold true in large scale systems. Yes, proper OOD is the way to go.

Coming back to my paraphrasing of Anna Karenina's first paragraph: successful projects do seem to have common processes; however, unsuccessful projects will be unsuccessful on their on ways.


9:25 PM | 0 comment(s) |


Ronaldinho is back...
Thursday, January 28, 2010

Yes, he's back. I didn't think he could recover his fitter form, but playing in the World Cup is probably enough incentive for anybody. Look at him go. Great moves and great goals.



11:02 PM | 0 comment(s) |


Java vs. AS3 coding styles
Friday, January 01, 2010

Coding styles evolve with the times and are as different as there are developers and programming languages. Where do coding styles come from?

We all have our ingrained way of writing and formatting code. The coding styles I've followed throughout these years come from my time in university while an undergrad student (University of Waterloo). Our programming assignments had specific requirements to follow (remember those pre/post statements for every function?). Depending on the programming class and project, skeletons of code were given to be filled with the real assignment. Because of this, I learned to write and comment code like my TAs and professors.

It was a vicious cycle: because everyone came from the same school, we all did the same things, for example, where the curly braces went, how many spaces between each line were required, where class members went. Coincidentally, the cycle continued into our co-op terms and first jobs around the Waterloo area: a large percentage of the senior developers in the software companies where we started our careers were also from Waterloo--and I believe that hasn't changed. We were a happy, uniformly trained coding family (mind you that this was a good thing).

I now have my own coding style and I notice when other developers do different things from what I do. Lately, I've been going through a lot of AS3 code and I have noticed, among other things, that curly braces are placed on their own lines. For example, a class definition may look as follows:
public class Button extends UIComponent
{
public function Button()
{
super();
}
}
There's nothing wrong with this class definition or the syntax. However, most, if not all, Flash developers follow this style. Why? Where did it come from? Do they use it because every other Flash developer codes with this style and all the code samples they got a hold of when learning the language looked like this?

I think this is it. Code is cheap in the internet: someone will post a piece of code and someone else will use, borrow, and steal it (it's the way of the modern developer).

I've used a few languages in the past and I never liked this particular coding style--a whole line for a curly brace. Because most of the apps I've worked on are Java enterprise apps, I adhere, almost religiously, to Sun's Code Conventions for the Java Programming Language. I'm not a zealot, though I like to know that there are other programmers out there that adhere to same style standards I do and, therefore, I will know how to navigate their code when the time for maintenance comes--and that time will come.

By the way, the Button class above, should be written as:
public class Button extends UIComponent {
public function Button() {
super();
}
}
If the code works, does it make a difference where the curly braces go? As with everything, it depends on how you look at it. I think my way is a more concise way of writing code. I can have a few more lines in front of me per screen page. I know many developers and they like to have a lot of code on the screen at one time as well. What's more, vertical real state keeps shrinking, depending on the display you use. I'm a laptop user (ThinkPad X200), and the form factor keeps getting wider but shorter. Who said that a larger horizontal form factor is better? Maybe it's the way we're using our computers--entertainment systems--and we need a wider aspect ratio for better movie quality.

So, in the world of AS3 development, I doubt I will convince these programmers to write more compact code a la C, C++, Java, C#. I've asked a couple of them why they code that way, and the answer I get sounds the same every time: "it's the right way of doing it." Further more, like me, they can only guess where their coding style comes from--school, books, code samples.

Most samples of AS3 and Flex code I have found on the net have this elongated, vertically wasteful style. And because of our need to borrow and steal code from other developers I think future AS3 and Flex code will continue to waste screen real state with those extra carriage returns.

I'll do my best to change this habit, and this I promise: every sample of AS3 or Flex code I'll ever publish will have the more compact coding style familiar to C, C++, Java, C# developers.

On a final note, please use curly braces for one-line statement inside for, while, if, or else conditionals. Yes to this:
    if (true) {
doSomethingAwesome();
} else {
doSomethingAwful();
}
No to this:
    if (true)
doSomethingAwesome();
else
doSomethingAwful();
And never to this:
    if (true) doSomethingAwesome(); else doSomethingAwful();
What about scripting language that don't require braces or semicolons? That's a whole different post.

Finally, yes, I know other C, C++, Java, C# developers use the elongated style. To them, I say, stop it.


8:52 PM | 5 comment(s) |


AS3 and anonymous event handlers
Tuesday, December 29, 2009

Event listener models are not new; however, AS3 event listeners are new to me.

I've developed a couple of application in Java Swing and Eclipse RCP, and both frameworks rely heavily on the event dispatch/listener model. AS3 being a full fledged Object Oriented programming language also uses it.

This week I continue fixing a few bugs for one the Flash projects I'm working on. Our application aggregates content from Flickr, YouTube, and Twitter and displays it all in one place. Everything coming into the page is piped through REST calls (HTTP GET requests) as XML structures. Making GET requests and consuming XML structures is easy in Flash, because AS3 provides APIs for HTTP connections and XML parsing.

To load an image in our application we need to make 2 Flickr API calls: first, get the sizing information of a particular image; second, load the desired size of the particular image. In both cases, I need to make an HTTP call and then wait for the response to complete, for which I need 2 event listeners registered. Registering events is all fine and all, but I wanted to do something I've been doing for a while in Java--creating anonymous event handlers on the spot.

If you were wondering how to do this, following is a snippet of code to demonstrate the :
function loadImage(FLICKR_KEY:String, PHOTO_ID:String):void {
var sizeURLRequest:URLRequest =
new URLRequest("http://api.flickr.com/"
+ "services/rest/?method=flickr.photos.getSizes"
+ "&api_key="
+ FLICKR_KEY
+ "&photo_id="
+ PHOTO_ID);
var sizeURLLoader : URLLoader = new URLLoader();
sizeURLLoader.addEventListener(Event.COMPLETE, function (event:Event):void {
var sizeXML : XML = XML(event.target.data);
var targetURL : URLRequest = new URLRequest(sizeXML[0].sizes[0].size[4].@source);

imageLoader = new Loader();
imageLoader.contentLoaderInfo.addEventListener(Event.COMPLETE, showImage);
imageLoader.load(targetURL);
});

sizeURLLoader.load(sizeURLRequest);
}
The anonymous listener is the statement sizeURLLoader.addEventListener(Event.COMPLETE, function (event:Event):void { ... }. This is the same as implementing an interface as an anonymous class, which is what would be familiar to you if you were a Java developer.

Note that showImage could also be implemented as an anonymous listener, but in this case there's too much logic to display the image and therefore is a separate function that looks as follows:
private function showImage($event:Event):void {
// Display image here...
}


11:47 PM | 0 comment(s) |


RESTful Flex/Flash client
Saturday, December 26, 2009

This week I had to program in AS3. I didn't code from scratch the application I worked on; however, I designed the architecture, so I was familiar with the source and felt confident to jump in to do updates. In the process, I was introduced to the default Flash development environment--CS4.

Coming from a world of Vim, Eclipse, NetBeans, and Visual Studio, I found the IDE lacking in functionality. True enough, I can compile code and export everything into SWFs files. But what about the little extras that make developing software fun? Where's the vi plugin? Where's the build file? What about code completion?

A couple of Flash developers I know suggested I try Flash Builder, which is based on Eclipse. I downloaded it and coded my first application--a RESTful Flex client. My first application, however, is not a full AS3 app; it's a Flex app coded in a language that is a hybrid of XML and AS3.

The application, which I call TwitterFlex, looks as follows:



The running version is here TwitterFlex: you click the button and it connects to Twitter's REST API to retrieve the latest 20 public updates.

Let me break down the code in 3 sections--XML stuff, AS3 code, and UI logic--because I think most Flex application will have the same code structure that my toy example has.

XML Stuff: the web service connection
Connecting to web services through HTTP is such a common requirement that the Flex API already includes code to do just that.

Creating an HTTP call that connects to Twitter is done with the following XML code:
<?xml version="1.0" encoding="utf-8"?>
<mx:Application
xmlns:mx="http://www.adobe.com/2006/mxml"
layout="absolute"
viewSourceURL="srcview/index.html">

(1) <mx:HTTPService
(2) id="RESTService"
(3) url="json.jsp"
(4) resultFormat="text"
(5) result="onLoadTweetsResult(event)"
(6) fault="onLoadTweetsFault(event)"
(7) showBusyCursor="true">
(8) </mx:HTTPService>
If you've seen XML files before, the first line should look familiar: with the mx directive we're telling whatever will parse this file that we are using Adobe's http://www.adobe.com/2006/mxml package.

Line (1) instantiates an HTTPService object; in line (2) I give the instance an id of RESTService. In line (3) I set the URL value of json.jsp (because of cross domain issues, I need to call a local pass through to talk to Twitter--this is a simple JSP and the code is at the end of this entry). Lines (5) and (6) point to the event handlers of the HTTP responses, with (5) handling success and (6) handling failure.

Note that I have a service instantiated, but I haven't connected to it yet. I leave the connection to the service when the user clicks a button (see the UI section below).

AS3 Code: the <mx:Script></mx:String> tag
With the ability to make web service calls, I need to program the event handlers of the HTTP responses and other functions that are needed for user interaction or logic that needs to be performed as part user requests. AS3 code is enclosed in the XML element <mx:Script>. The AS3 code in my application looks as follows:
<mx:Script>
<![CDATA[
import mx.controls.dataGridClasses.DataGridColumn;
import mx.messaging.AbstractConsumer;
import mx.rpc.events.ResultEvent;
import mx.rpc.events.FaultEvent;
import mx.collections.ArrayCollection;
import com.adobe.serialization.json.JSON;

[Bindable]
private var tweets:ArrayCollection;

private function loadTweets():void {
RESTService.send();
}

private function
onLoadTweetsResult(event:ResultEvent):void {
var rawJSON:String = String(event.result);
var arrayJSON:Array = JSON.decode(rawJSON) as Array;
tweets = new ArrayCollection(arrayJSON);
}

private function
onLoadTweetsFault(event:FaultEvent):void {
trace(event.fault.toString());
}

private function
getScreenName(tweet:Object,
column:DataGridColumn):String {
return tweet.user.screen_name;
}

private function
getName(tweet:Object,
column:DataGridColumn):String {
return tweet.user.name;
}
]]>
</mx:Script>
The import statements should be obvious. Next, however, is this [Bindable] statement just above the tweets variable. As per Adobe's documentation, this metatag is an event listener hook that updates anything using the instance of the data with a message saying that the original copy changed. In short, [Bindable] makes tweets a global variable.

Next, comes the loadTweets() function, which tells the web service I defined earlier to run by executing the send() method of the HTTPService object.

Handling of the HTTP responses is delegated to the onLoadTweetsResult() and onLoadTweetsFault() methods. The former, is where Twitter's JSON object is parsed using a JSON library that is available for download. Before you can use it, first download it and then add it to your Flex Builder's project library (I thought it was a default Adobe package, but it's not--let me save you some time here).

Finally, the last 2 methods, getScreenName() and getName(), return the value of the fields in the JSON object that I use in the UI components of the app, which I cover next.

The UI
The last portion of the code is the UI of the application. I won't cover the details of every XML tag available, because there are already many examples of this out there. What's more, I only use 5 UI elements: a VBox, a Label, a DataGrid, a DataGridColumn, and a Button. My UI, in code, looks as follows:
<mx:VBox 
width="100%"
height="50%"
paddingBottom="60"
paddingLeft="60"
paddingRight="60"
paddingTop="60">

<mx:Label
text="RESTful Flex/Flash client (jose@josesandoval.com)"
fontSize="24"
fontWeight="bold" />

(1) <mx:DataGrid
dataProvider="{tweets}"
width="100%"
rowCount="12">

(2) <mx:columns>
(3)<mx:DataGridColumn
width="200"
headerText="Screen Name"
labelFunction="getScreenName" />

(4) <mx:DataGridColumn
width="200"
headerText="Name"
labelFunction="getName" />

(5) <mx:DataGridColumn
headerText="Tweet"
dataField="text" />

</mx:columns>
</mx:DataGrid>

<mx:Button
label="Get Tweets"
click="{RESTService.send()}" />

</mx:VBox>
</mx:Application>
The only lines I will cover in detail are numbered. Everything above and below them is obvious.

Line (1) instantiates a DataGrid object that is provided in the code space mx. The tag's element dataProvider="{tweets}" is passing the grid object the tweets global variable I instantiated in the AS3 code--you can see how the event dispatching makes sense in the context of the application: if the state of tweets changes, every component that is using it has to be notified.

Lines (3) and (4) define columns in the grid. The tag's element labelFunction is telling the instance of the particular column that it needs to run the function named in the element's value. For example, getScreenName calls the function coded earlier getScreenName() and getName calls getName(). If you look at the functions above, you see that I'm accessing the user element of the parsed JSON object.

What about the variable column:DataGridColumn in the method's signature? Well, that's another callback registration that it's telling the code that it will be used in an object of type DataGridColumn somewhere while executing.

And finally, line (5) doesn't use a function callback. Because the object tweets has been brought into the scope of the loop for the DataGrid object, I can access a tweet's element directly and therefore I use dataField="text", where text is a member of the instance of the global tweets.

Final Words
This XML and AS3 code hybrid is the next evolution of computer languages. Flex is a compilable meta-language, with AS3 scripting capabilities, that lets us take advantage of the ubiquity of the Flash player. Flex apps are Flash apps and will run on any browser that has a Flash player installed.

Even though I'm liking coding in Flash and this new meta-computer language, I wonder why we insist in recreating all the functionality of the web browser in Flash applications? Flash is cool and all, but we can do most of what it does in plain HTML and JavaScript code.



If you want to see the whole listing in one place, Flex gives you the option of attaching the source code to your deployed applications. The source for this app is here TwitterFlex/srcview/index.html, which you can also access by right-clicking on the application and then selecting "View Source."

json.jsp
I mentioned earlier that I use a JSP file to serve as a proxy to talk Twitter from the hosting server. This is because you can't make direct calls from a Flash app to Twitter unless you are a registered user. I don't have a developer's API key, and for this example I still want to use the public stream. The JSP file looks as follows:
<%@ page contentType="application/json; 
charset=UTF-8" %>
<%@ page import="java.io.BufferedReader,
java.io.IOException,
java.io.InputStreamReader,
java.net.MalformedURLException,
java.net.URL,
java.net.URLConnection" %>
<%
try {
URL twitter =
new URL(
"http://twitter.com/statuses/public_timeline.json");
URLConnection tc = twitter.openConnection();
BufferedReader in =
new BufferedReader(
new InputStreamReader(tc.getInputStream()));
String line;

while ((line = in.readLine()) != null) {
out.println(line);
}
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
in.close();
}
%>
I open a connection and make an HTTP GET call. I then stream the result back to the caller of the JSP, but I set the data stream to have a MIME type of application/json.


11:31 AM | 2 comment(s) |


This page is powered by Blogger. Isn't yours?

Guestbook
© Jose Sandoval 2004-2009 jose@josesandoval.com