Techiella: 2008

Monday, September 8, 2008

Pessimistic and optimistic locking

Pessimistic and optimistic locking

Transactional isolation is usually implemented by locking whatever is accessed in a transaction. There are two different approaches to transactional locking: Pessimistic locking and optimistic locking.

The disadvantage of pessimistic locking is that a resource is locked from the time it is first accessed in a transaction until the transaction is finished, making it inaccessible to other transactions during that time. If most transactions simply look at the resource and never change it, an exclusive lock may be overkill as it may cause lock contention, and optimistic locking may be a better approach. With pessimistic locking, locks are applied in a fail-safe way. In the banking application example, an account is locked as soon as it is accessed in a transaction. Attempts to use the account in other transactions while it is locked will either result in the other process being delayed until the account lock is released, or that the process transaction will be rolled back. The lock exists until the transaction has either been committed or rolled back.

With optimistic locking, a resource is not actually locked when it is first is accessed by a transaction. Instead, the state of the resource at the time when it would have been locked with the pessimistic locking approach is saved. Other transactions are able to concurrently access to the resource and the possibility of conflicting changes is possible. At commit time, when the resource is about to be updated in persistent storage, the state of the resource is read from storage again and compared to the state that was saved when the resource was first accessed in the transaction. If the two states differ, a conflicting update was made, and the transaction will be rolled back.

In the banking application example, the amount of an account is saved when the account is first accessed in a transaction. If the transaction changes the account amount, the amount is read from the store again just before the amount is about to be updated. If the amount has changed since the transaction began, the transaction will fail itself, otherwise the new amount is written to persistent storage.

On Transaction Isolation Levels

Transaction Isolation Levels

The ANSI/ISO SQL standard defines four levels of transaction isolation, with different possible outcomes for the same transaction scenario. That is, the same work performed in the same fashion with the same inputs may result in different answers, depending on your isolation level. These levels are defined in terms of three phenomena that are either permitted or not at a given isolation level:

Dirty read: The meaning of this term is as bad as it sounds. You're permitted to read uncommitted, or dirty, data. You can achieve this effect by just opening an OS file that someone else is writing and reading whatever data happens to be there. Data integrity is compromised, foreign keys are violated, and unique constraints are ignored.
Nonrepeatable read: This simply means that if you read a row at time T1 and try to reread that row at time T2, the row may have changed. It may have disappeared, it may have been updated, and so on.
Phantom read: This means that if you execute a query at time T1 and re-execute it at time T2, additional rows may have been added to the database, which may affect your results. This differs from a nonrepeatable read in that with a phantom read, data you already read hasn't been changed, but instead, more data satisfies your query criteria than before.

Note that the ANSI/ISO SQL standard defines transaction-level characteristics, not just individual statement-by-statement-level characteristics. I'll examine transaction-level isolation, not just statement-level isolation.

The SQL isolation levels are defined based on whether they allow each of the preceding phenomena. It's interesting to note that the SQL standard doesn't impose a specific locking scheme or mandate particular behaviors, but rather describes these isolation levels in terms of these phenomena—allowing for many different locking/concurrency mechanisms to exist (see Table 1).

Table 1: `ANSI` isolation levels
Isolation Level	Dirty Read	Nonrepeatable Read	Phantom Read
READ UNCOMMITTED	Permitted	Permitted	Permitted
READ COMMITTED	--	Permitted	Permitted
REPEATABLE READ	--	--	Permitted
SERIALIZABLE	--	--	--

Oracle explicitly supports the READ COMMITTED and SERIALIZABLE isolation levels as they're defined in the standard. However, this doesn't tell the whole story. The SQL standard was trying to set up isolation levels that would permit various degrees of consistency for queries performed at each level. REPEATABLE READ is the isolation level that the SQL standard claims will guarantee a read-consistent result from a query. In the SQL standard definition, READ COMMITTED doesn't give you consistent results, and READ UNCOMMITTED is the level to use to get nonblocking reads.

However, in Oracle Database, READ COMMITTED has all of the attributes required to achieve read-consistent queries. In other databases, READ COMMITTED queries can and will return answers that never existed in the database. Moreover, Oracle Database also supports the spirit of READ UNCOMMITTED. The goal of providing a dirty read is to supply a nonblocking read, whereby queries are not blocked by, and do not block, updates of the same data. However, Oracle Database doesn't need dirty reads to achieve this goal, nor does it support them. Dirty reads are an implementation other databases must use to provide nonblocking reads.

In addition to the four defined SQL isolation levels, Oracle Database provides another level: READ ONLY. A READ ONLY transaction is equivalent to a REPEATABLE READ or SERIALIZABLE transaction that cannot perform any modifications in SQL. A transaction using a READ ONLY isolation level sees only those changes that were committed at the time the transaction began. Inserts, updates, and deletes aren't permitted in this mode (other sessions may update data, but not the READ ONLY transaction). Using this mode, you can achieve REPEATABLE READ and SERIALIZABLE levels of isolation.

I'll now move on to discuss exactly how multiversioning and read consistency fit into the isolation scheme and how databases that don't support multiversioning achieve the same results. This information is helpful for anyone who has used another database and believes he or she understands how the isolation levels must work. It's also interesting to see how ANSI/ISO SQL, the standard that was supposed to remove the differences between the databases, actually allows for them. This standard, while very detailed, can be implemented in very different ways.

READ UNCOMMITTED. The READ UNCOMMITTED isolation level allows dirty reads. Oracle Database doesn't use dirty reads, nor does it even allow them. The basic goal of a READ UNCOMMITTED isolation level is to provide a standards-based definition that allows for nonblocking reads. As you've seen, Oracle Database provides for nonblocking reads by default. You'd be hard-pressed to make a SELECT query block and wait in the database (as noted earlier, there is the special case of a distributed transaction). Every single query, be it a SELECT, INSERT, UPDATE, MERGE, or DELETE, executes in a read-consistent fashion. It might seem funny to refer to an UPDATE statement as a query, but it is. UPDATE statements have two components: a read component as defined by the WHERE clause, and a write component as defined by the SET clause. UPDATE statements read and write to the database, as do all DML statements. The case of a single row INSERT using the VALUES clause is the only exception to this, because such statements have no read component—just the write component.

In Chapter 1 [of Expert Oracle Database Architecture: 9i and 10g Programming Techniques and Solutions], I demonstrated Oracle Database's method of obtaining read consistency by way of a simple single table query, which retrieved rows that were deleted after the cursor was opened. I'm now going to explore a real-world example to see what happens in Oracle Database using multiversioning, as well as what happens in any number of other databases.

I'll start with that same basic table and query:

create table accounts
( account_number number primary key,
 account_balance number not null
);

select sum(account_balance)
from accounts;

Before the query begins, I have the data shown in Table 2.

Table 2: `ACCOUNTS` table before modifications
Row	Account Number	Account Balance
1	123	$500.00
2	456	$240.25
...	...	...
342,023	987	$100.00

Now, my SELECT statement starts executing and reads row 1, row 2, and so on. At some point while this query is in the middle of processing, a transaction moves $400 from account 123 to account 987. This transaction does the two updates but does not commit. The ACCOUNTS table now looks as shown in Table 3.

Table 3: `ACCOUNTS` table during modifications
Row	Account Number	Account Balance	Locked?
1	123	($500.00) changed to $100.00	X
2	456	$240.25	--
...	...	...	--
342,023	987	($100.00) changed to $500.00	X

As Table 3 shows, two of those rows are locked. If anyone tried to update them, they'd be blocked. So far, the behavior I'm seeing is more or less consistent across all databases. The difference will be in what happens when the query gets to the locked data.

When the query I'm executing gets to the block containing the locked row (row 342,023) at the bottom of the table, it will notice that the data in it has changed since execution began. To provide a consistent, or correct, answer, Oracle Database will create a copy of the block containing this row as it existed when the query began. That is, it will read a value of $100, which is the value that existed when the query began. Effectively, Oracle Database takes a detour around the modified data—it reads around it, reconstructing it from the undo (also known as a rollback) segment. A consistent and correct answer comes back without waiting for the transaction to commit.

Now, a database that allowed a dirty read would simply return the value it saw in account 987 at the time it read it—in this case, $500. The query would count the transferred $400 twice. Therefore, not only does it return the wrong answer, but also it returns a total that never existed in the table. In a multiuser database, a dirty read can be a dangerous feature. Personally, I've never seen the usefulness of it. Say that, rather than transferring, the transaction was actually just depositing $400 in account 987. The dirty read would count the $400 and get the "right" answer, wouldn't it? Well, suppose the uncommitted transaction was rolled back. I've just counted $400 that was never actually in the database.

The point here is that dirty read is not a feature; rather, it's a liability. In Oracle Database, it's just not needed. You get all of the advantages of a dirty read—no blocking—without any of the incorrect results.

READ COMMITTED. The READ COMMITTED isolation level states that a transaction may read only data that has been committed in the database. There are no dirty reads (reads of uncommitted data). There may be nonrepeatable reads (that is, rereads of the same row may return a different answer in the same transaction) and phantom reads (that is, newly inserted and committed rows become visible to a query that weren't visible earlier in the trans-action). READ COMMITTED is perhaps the most commonly used isolation level in database applications everywhere, and it's the default mode for Oracle Database. It's rare to see a different isolation level used in Oracle databases.

However, achieving READ COMMITTED isolation is not as cut-and-dried as it sounds. If you look at Table 1, it looks straightforward. Obviously, given the earlier rules, a query executed in any database using the READ COMMITTED isolation will behave in the same way, right? No, it won't. If you query multiple rows in a single statement in almost any other database, READ COMMITTED isolation can be as bad as a dirty read, depending on the implementation.

In Oracle Database, using multi-versioning and read-consistent queries, the answer I get from the ACCOUNTS query is the same in the READ COMMITTED example as it was in the READ UNCOMMITTED example. Oracle Database will reconstruct the modified data as it appeared when the query began, returning the answer that was in the database when the query started.

Now I'll take a look at how my previous example might work in READ COMMITTED mode in other databases. You might find the answer surprising. I'll pick up my example at the point described in Table 3:

I'm in the middle of the table. I've read and summed the first N rows.
The other transaction has moved $400 from account 123 to account 987.
The transaction has not yet committed, so rows containing the information for accounts 123 and 987 are locked.

I know what happens in Oracle Database when it gets to account 987: It will read around the modified data, find out it should be $100, and complete. Table 4 shows how another database, running in some default READ COMMITTED mode, might arrive at the answer.

Table 4: Timeline in a non-Oracle database using `READ COMMITTED` isolation
Time	Query	Account Transfer Transaction
T1	Reads row 1. Sum = $500.00 so far.	--
T2	Reads row 2. Sum = $740.25 so far.	--
T3	--	Updates row 1 and puts an exclusive lock on row 1, preventing other updates and reads. Row 1 now has $100.00.
T4	Reads row N. Sum = . . .	--
T5	--	Updates row 342,023 and puts an exclusive lock on this row. Row now has $500.00.
T6	Tries to read row 342,023 and discovers that it is locked. This session will block and wait for this block to become available. All processing on this query stops.	--
T7	--	Commits transaction.
T8	Reads row 342,023, sees $500.00, and presents a final answer that includes the $400.00 double-counted.

The first thing to notice in Table 4 is that this other database, upon reading account 987, will block my query. This session must wait on that row until the transaction holding the exclusive lock commits. This is one reason why many people have a bad habit of committing every statement, instead of processing well-formed transactions consisting of all of the statements needed to take the database from one consistent state to the next. Updates interfere with reads in most other databases. The really bad news in this scenario is that I'm making the end user wait for the wrong answer. I still receive an answer that never existed in the database, as with the dirty read, but this time I made the user wait for the wrong answer. In the next section, I'll look at what these other databases must do to achieve read-consistent, correct results.

The lesson here is that various databases executing in the same, apparently safe isolation level can and will return very different answers under the same circumstances. It's important to understand that, in Oracle Database, nonblocking reads are not had at the expense of correct answers. You can have your cake and eat it too, sometimes.

REPEATABLE READ. The goal of REPEATABLE READ is to provide an isolation level that gives consistent, correct answers and prevents lost updates. I'll show examples of what you must do in Oracle Database to achieve these goals and examine what happens in other systems. If you have REPEATABLE READ isolation, the results from a given query must be consistent with respect to some point in time. Most databases (not Oracle) achieve repeatable reads through the use of row-level, shared read locks. A shared read lock prevents other sessions from modifying data that you've read. This, of course, decreases concurrency. Oracle Database opted for the more concurrent, multiversioning model to provide read-consistent answers.

Using multiversioning in Oracle Database, you get an answer consistent with when the query began execution. In other databases, using shared read locks, you get an answer that's consistent with when the query completes—that is, when you can get the answer at all (more on this in a moment).

In a system that employs a shared read lock to provide repeatable reads, you'd observe rows in a table getting locked as the query processed them. So, using the earlier example, as my query reads the ACCOUNTS table, it'd leave shared read locks on each row, as shown in Table 5.

Table 5: Timeline 1 in a non-Oracle database using `READ REPEATABLE` isolation
Time	Query	Account Transfer Transaction
T1	Reads row 1. Sum = $500.00 so far. Block 1 has a shared read lock on it.	--
T2	Reads row 2. Sum = $740.25 so far. Block 2 has a shared read lock on it.	--
T3	--	Attempts to update row 1 but is blocked. Transaction is suspended until it can obtain an exclusive lock.
T4	Reads row N. Sum = . . .	--
T5	Reads row 342,023, sees $100.00, and presents final answer.	--
T6	Commits transaction.	--
T7	--	Updates row 1 and puts an exclusive lock on this block. Row now has $100.00.
T8	--	Updates row 342,023 and puts an exclusive lock on this block. Row now has $500.00. Commits transaction.

Table 5 shows that I now get the correct answer, but at the cost of physically blocking one transaction and executing the two transactions sequentially. This is one of the side effects of shared read locks for consistent answers: Readers of data will block writers of data. This is in addition to the fact that, in these systems, writers of data will block readers of data. Imagine if ATMs worked this way in real life.

So, you can see how shared read locks can inhibit concurrency, but how they can also cause spurious errors to occur. In Table 6, I start with my original table, but this time with the goal of transferring $50 from account 987 to account 123.

Time	Query	Account Transfer Transaction
T1	Reads row 1. Sum = $500.00 so far. Block 1 has a shared read lock on it.	--
T2	Reads row 2. Sum = $740.25 so far. Block 2 has a shared read lock on it.	--
T3	--	Updates row 342,023 and puts an exclusive lock on block 342,023, preventing other updates and shared read locks. This row now has $50.00.
T4	Reads row N. Sum = . . .	--
T5	--	Attempts to update row 1 but is blocked. Transaction is suspended until it can obtain an exclusive lock.
T6	Attempts to read row 342,023 but cannot, as an exclusive lock is already in place.	-- I've just reached the classic deadlock condition. My query holds resources the update needs, and vice versa. My query has just deadlocked with my update transaction. One of them will be chosen as the victim and will be killed. I just spent a long time and a lot of resources only to fail and get rolled back at the end. This is the second side effect of shared read locks: Readers and writers of data can and frequently will deadlock each other.

As you've seen in Oracle Database, you have statement-level read consistency without reads blocking writes or deadlocks. Oracle Database never uses shared read locks—ever. Oracle has chosen the harder-to-implement but infinitely more concurrent multi-versioning scheme.

SERIALIZABLE. This is generally considered the most restrictive level of transaction isolation, but it provides the highest degree of isolation. A SERIALIZABLE transaction operates in an environment that makes it appear as if there are no other users modifying data in the database. Any row you read is assured to be the same upon a reread, and any query you execute is guaranteed to return the same results for the life of a transaction.

For example, if you execute—

select * from T;
begin dbms_lock.sleep( 60*60*24 ); end;
select * from T;

—the answers returned from T would be the same, even though you just slept for 24 hours (or you might get an ORA-1555, snapshot too old error). The isolation level assures you these two queries will always return the same results. Side effects, or changes, made by other transactions aren't visible to the query no matter how long it has been running.

In Oracle Database, a SERIALIZABLE transaction is implemented so that the read consistency you normally get at the statement level is extended to the transaction. (As noted earlier, there's also an isolation level in Oracle called READ ONLY. It has all of the qualities of the SERIALIZABLE isolation level, but it prohibits modifications. Note that the SYS user (or users connected as SYSDBA) cannot have a READ ONLY or SERIALIZABLE transaction. SYS is special in this regard.)

Instead of results being consistent with respect to the start of a statement, they're preordained when you begin the transaction. In other words, Oracle Database uses the rollback segments to reconstruct the data as it existed when your transaction began, instead of when your statement began. That's a pretty deep thought there: the database already knows the answer to any question you might ask it, before you ask it.

But this degree of isolation comes with a price, and that price is the following possible error:

ERROR at line 1:
ORA-08177: can't serialize access for this transaction

You'll get this message whenever you try to update a row that has changed since your transaction began. (Note that Oracle tries to do this purely at the row level, but you may receive an ORA-01877 error even when the row you're interested in modifying hasn't been modified. The ORA-01877 may happen due to some other row(s) being modified on the block that contains your row.)

Oracle Database takes an optimistic approach to serialization: it gambles on the fact that the data your transaction wants to update won't be updated by any other transaction. This is typically the way it happens, and usually the gamble pays off, especially in quick transaction, OLTP-type systems. If no one else updates your data during your transaction, this isolation level, which will generally decrease concurrency in other systems, will provide the same degree of concurrency as it would without SERIALIZABLE transactions. The downside is that you may get the ORA-08177 error if the gamble doesn't pay off. If you think about it, however, it's worth the risk. If you're using SERIALIZABLE transaction, you shouldn't expect to update the same information as other transactions.

If you do, you should use the SELECT ... FOR UPDATE as described in Chapter 1 [of Expert Oracle Database Architecture: 9i and 10g Programming Techniques and Solutions]. This will serialize the access. So, you can effectively use an isolation level of SERIALIZABLE if you:

* Have a high probability of no one else modifying the same data
* Need transaction-level read consistency
* Will be doing short transactions (to help make the first bullet point a reality)

But—there is always a "but"—you must understand these different isolation levels and their implications. Remember, with isolation set to SERIALIZABLE, you won't see any changes made in the database after the start of your transaction, until you commit. Applications that try to enforce their own data integrity constraints, such as the resource scheduler described in Chapter 1 [of Expert Oracle Database Architecture: 9i and 10g Programming Techniques and Solutions], must take extra care in this regard. The problem in Chapter 1 was that you couldn't enforce your integrity constraint in a multiuser system since you couldn't see changes made by other uncommitted sessions. Using SERIALIZABLE, you'd still not see the uncommitted changes, but you'd also not see the committed changes made after your transaction began.

As a final point, be aware that SERIALIZABLE does not mean that all transactions executed by users will behave as if they were executed one right after another in a serial fashion. It doesn't imply that there's some serial ordering of the transactions that will result in the same outcome. The phenomena previously described by the SQL standard do not make this happen. This last point is a frequently misunderstood concept, and a small demonstration will clear it up. The following table represents two sessions performing work over time. The database tables A and B start out empty and are created as follows:

ops$tkyte@ORA10G> create table a ( x int );
Table created.

ops$tkyte@ORA10G> create table b ( x int );
Table created.

Now I have the series of events shown in Table 7.

Table 7: SERIALIZABLE Transaction Example Time Session 1 Executes Session 2 Executes
T1 Alter session set isolation_level=serializable; --
T2 -- Alter session set isolation_level=serializable;
T3 Insert into a select count(*) from b; --
T4 -- Insert into b select count(*) from a;
T5 Commit; --
T6 -- Commit;

Now, when the processes shown Table 7 are all said and done, tables A and B will each have a row with the value 0. If there was some serial ordering of the transactions, I couldn't possibly have both tables each containing a row with the value 0. If session 1 executed before session 2, then table B would have a row with the value 1 in it. If session 2 executed before session 1, then table A would have a row with the value 1 in it. As executed here, however, both tables will have rows with a value of 0. They just executed as if they were the only transaction in the database at that point in time. No matter how many times session 1 queries table B, the count will be the count that was committed in the database at time T1. Likewise, no matter how many times session 2 queries table A, the count will be the same as it was at time T2.

Handling concurrency in long transactions

Isolated transactions, optimistic locking, and pessimistic locking only work within a single database transaction. However, many applications have use-cases that are long running and that consist of multiple database transactions that read and update shared data. For example, suppose a use-case describes how a user edits an order (the shared data). This is a relatively lengthy process, which might take as long as several minutes and consists of multiple database transactions. Because data is read in one database transaction and modified in another, the application must handle concurrent access to shared data differently. It must use the Optimistic Offline Lock pattern or the Pessimistic Offline Lock pattern, two more patterns described by Fowler in Patterns of Enterprise Application Architecture.
Optimistic Offline Lock pattern

One option is to extend the optimistic locking mechanism described earlier and check in the final database transaction of the editing process that the data has not changed since it was first read. You can, for example, do this by using a version number column in the shared data's table. At the start of the editing process, the application stores the version number in the session state. Then, when the user saves their changes, the application makes sure that the saved version number matches the version number in the database.

Because the Optimistic Offline Lock pattern only detects changes when the user tries to save their changes, it only works well when starting over is not a burden on the user. When implementing such use-cases where the user would be extremely annoyed by having to discard several minutes' work, a much better option is to use the Pessimistic Offline Lock.
Pessimistic Offline Lock pattern

The Pessimistic Offline Lock pattern handles concurrent updates across a sequence of database transactions by locking the shared data at the start of the editing process, which prevents other users from editing it. It is similar to the pessimistic locking mechanism described earlier except that the locks are implemented by the application rather than the database. Because only one user at a time is able to edit the shared data, they are guaranteed to be able to save their changes.

Pessimistic locking

An alternative to optimistic locking is pessimistic locking. A transaction acquires locks on the rows when it reads them, which prevent other transactions from accessing the rows. The details depend on the database, and unfortunately not all databases support pessimistic locking. If it is supported by the database, it is quite easy to implement a pessimistic locking mechanism in an application that executes SQL statements directly. However, as you would expect, using pessimistic locking in a JDO or Hibernate application is even easier. JDO provides pessimistic locking as a configuration option, and Hibernate provides a simple programmatic API for locking objects.

In addition to handling concurrency within a single database transaction, you must often handle concurrency across a sequence of database transactions.

Optimistic locking

One way to handle concurrent updates is to use optimistic locking. Optimistic locking works by having the application check whether the data it is about to update has been changed by another transaction since it was read. One common way to implement optimistic locking is to add a version column to each table, which is incremented by the application each time it changes a row. Each UPDATE statement's WHERE clause checks that the version number has not changed since it was read. An application can determine whether the UPDATE statement succeeded by checking the row count returned by PreparedStatement.executeUpdate(). If the row has been updated or deleted by another transaction, the application can roll back the transaction and start over.

It is quite easy to implement an optimistic locking mechanism in an application that executes SQL statements directly. But it is even easier when using persistence frameworks such as JDO and Hibernate because they provide optimistic locking as a configuration option. Once it is enabled, the persistence framework automatically generates SQL UPDATE statements that perform the version check.

Optimistic locking derives its name from the fact it assumes that concurrent updates are rare and that, instead of preventing them, the application detects and recovers from them. An alternative approach is to use pessimistic locking, which assumes that concurrent updates will occur and must be prevented.

A study of the cultural effects of designing a user interface for a web-based service

Pei-Luen Patrick Rau and Sheau-Farn Max Liang

A1 Department of Management Information Systems, Chung Yuan Christian University, Chunli 320, Taiwan
A2 Department of Industrial Engineering and Management, National Taipei University of Technology, Taipei, Taiwan

Abstract:

With the globalisation of markets and the internet, customer satisfaction now depends on how seriously service providers take into account regional user needs combined with culturally-suitable user interface design. The objective of this study was to investigate the effects of two cultural dimensions, Time Orientation (TO) and Communication Style (CS), on user performance in browsing a web-based service. A comparison in Hofstede's cultural dimensions was also made between two Chinese-speaking student groups. The results showed that participants with polychronic TO performed faster and took fewer steps than those with monochronic TO. Participants with high-context CS were more disorientated than those with low-context CS. The comparison in cultural dimensions indicated that there is a variety even within Chinese-speaking cultures. To be able to meet customers' expectations and increase their satisfaction, it is necessary for user interface designers to consider these cultural effects.

Keywords:

cultural differences, user interface design, web-based service, Chinese

Wednesday, August 6, 2008

REST Anti-Patterns

When people start trying out REST, they usually start looking around for examples – and not only find a lot of examples that claim to be “RESTful”, or are labeled as a “REST API”, but also dig up a lot of discussions about why a specific service that claims to do REST actually fails to do so.

RelatedVendorContent

The Agile Business Analyst: Skills and Techniques needed for Agile

Agile Tool Evaluation Guide

Introducing application infrastructure virtualization and WebSphere Virtual Enterprise

Gamma's Jazz platform's first implementation: Rational Team Concert (Trial Download)

The Key to SOA Governance: Understanding the Essence of Business

Why does this happen? HTTP is nothing new, but it has been applied in a wide variety of ways. Some of them were in line with the ideas the Web’s designers had in mind, but many were not. Applying REST principles to your HTTP applications, whether you build them for human consumption, for use by another program, or both, means that you do the exact opposite: You try to use the Web “correctly”, or if you object to the idea that one is “right” and one is “wrong”: in a RESTful way. For many, this is indeed a very new approach.

The usual standard disclaimer applies: REST, the Web, and HTTP are not the same thing; REST could be implemented with many different technologies, and HTTP is just one concrete architecture that happens to follow the REST architectural style. So I should actually be careful to distinguish “REST” from “RESTful HTTP”. I’m not, so let’s just assume the two are the same for the remainder of this article.

As with any new approach, it helps to be aware of some common patterns. In the first two articles of this series, I’ve tried to outline some basic ones – such as the concept of collection resources, the mapping of calculation results to resources in their own right, or the use of syndication to model events. A future article will expand on these and other patterns. For this one, though, I want to focus on anti-patterns – typical examples of attempted RESTful HTTP usage that create problems and show that someone has attempted, but failed, to adopt REST ideas.

Let’s start with a quick list of anti-patterns I’ve managed to come up with:

Tunneling everything through GET
Tunneling everything through POST
Ignoring caching
Ignoring response codes
Misusing cookies
Forgetting hypermedia
Ignoring MIME types
Breaking self-descriptiveness

Let’s go through each of them in detail.

Tunneling everything through GET

To many people, REST simply means using HTTP to expose some application functionality. The fundamental and most important operation (strictly speaking, “verb” or “method” would be a better term) is an HTTP GET. A GET should retrieve a representation of a resource identified by a URI, but many, if not all existing HTTP libraries and server programming APIs make it extremely easy to view the URI not as a resource identifier, but as a convenient means to encode parameters. This leads to URIs like the following:

http://example.com/some-api?method=deleteCustomer&id=1234

The characters that make up a URI do not, in fact, tell you anything about the “RESTfulness” of a given system, but in this particular case, we can guess the GET will not be “safe”: The caller will likely be held responsible for the outcome (the deletion of a customer), although the spec says that GET is the wrong method to use for such cases.

The only thing in favor of this approach is that it’s very easy to program, and trivial to test from a browser – after all, you just need to paste a URI into your address bar, tweak some “parameters”, and off you go. The main problems with this anti-patterns are:

Resources are not identified by URIs; rather, URIs are used to encode operations and their parameters
The HTTP method does not necessarily match the semantics
Such links are usually not intended to be bookmarked
There is a risk that “crawlers” (e.g. from search engines such as Google) cause unintended side effects

Note that APIs that follow this anti-pattern might actually end up being accidentally restful. Here is an example:

http://example.com/some-api?method=findCustomer&id=1234

Is this a URI that identifies an operation and its parameters, or does it identify a resource? You could argue both cases: This might be a perfectly valid, bookmarkable URI; doing a GET on it might be “safe”; it might respond with different formats according to the Accept header, and support sophisticated caching. In many cases, this will be unintentional. Often, APIs start this way, exposing a “read” interface, but when developers start adding “write” functionality, you find out that the illusion breaks (it’s unlikely an update to a customer would occur via a PUT to this URI – the developer would probably create a new one).

Tunneling everything through POST

This anti-pattern is very similar to the first one, only that this time, the POST HTTP method is used. POST carries an entity body, not just a URI. A typical scenario uses a single URI to POST to, and varying messages to express differing intents. This is actually what SOAP 1.1 web services do when HTTP is used as a “transport protocol”: It’s actually the SOAP message, possibly including some WS-Addressing SOAP headers, that determines what happens.

One could argue that tunneling everything through POST shares all of the problems of the GET variant, it’s just a little harder to use and cannot explore caching (not even accidentally), nor support bookmarking. It actually doesn’t end up violating any REST principles so much – it simply ignores them.

Ignoring caching

Even if you use the verbs as they are intended to be used, you can still easily ruin caching opportunities. The easiest way to do so is by simply including a header such as this one in your HTTP response:

Cache-control: no-cache

Doing so will simply prevent caches from caching anything. Of course this may be what you intend to do, but more often than not it’s just a default setting that’s specified in your web framework. However, supporting efficient caching and re-validation is one of the key benefits of using RESTful HTTP. Sam Ruby suggests that a key question to ask when assessing somethings RESTfulness is “do you support ETags”? (ETags are a mechanism introduced in HTTP 1.1 to allow a client to validate whether a cached representation is still valid, by means of a cryptographic checksum). The easiest way to generate correct headers is to delegate this task to a piece of infrastructure that “knows” how to do this correctly – for example, by generating a file in a directory served by a Web server such as Apache HTTPD.

Of course there’s a client side to this, too: when you implement a programmatic client for a RESTful service, you should actually exploit the caching capabilities that are available, and not unnecessarily retrieve a representation again. For example, the server might have sent the information that the representation is to be considered “fresh” for 600 seconds after a first retrieval (e.g. because a back-end system is polled only every 30 minutes). There is absolutely no point in repeatedly requesting the same information in a shorter period. Similarly to the server side of things, going with a proxy cache such as Squid on the client side might be a better option than building this logic yourself.

Caching in HTTP is powerful and complex; for a very good guide, turn to Mark Nottingham’s Cache Tutorial.

Ignoring status codes

Unknown to many Web developers, HTTP has a very rich set of application-level status codes for dealing with different scenarios. Most of us are familiar with 200 (“OK”), 404 (“Not found”), and 500 (“Internal server error”). But there are many more, and using them correctly means that clients and servers can communicate on a semantically richer level.

For example, a 201 (“Created”) response code signals that a new resource has been created, the URI of which can be found in a Location header in the response. A 409 (“Conflict”) informs the client that there is a conflict, e.g. when a PUT is used with data based on an older version of a resource. A 412 (“Precondition Failed”) says that the server couldn’t meet the client’s expectations.

Another aspect of using status codes correctly affects the client: The status codes in different classes (e.g. all in the 2xx range, all in the 5xx range) are supposed to be treated according to a common overall approach – e.g. a client should treat all 2xx codes as success indicators, even if it hasn’t been coded to handle the specific code that has been returned.

Many applications that claim to be RESTful return only 200 or 500, or even 200 only (with a failure text contained in the response body – again, see SOAP). If you want, you can call this “tunneling errors through status code 200”, but whatever you consider to be the right term: if you don’t exploit the rich application semantics of HTTP’s status codes, you’re missing an opportunity for increased re-use, better interoperability, and looser coupling.

Misusing cookies

Using cookies to propagate a key to some server-side session state is another REST anti-pattern.

Cookies are a sure sign that something is not RESTful. Right? No; not necessarily. One of the key ideas of REST is statelessness – not in the sense that a server can not store any data: it’s fine if there is resource state, or client state. It’s session state that is disallowed due to scalability, reliability and coupling reasons. The most typical use of cookies is to store a key that links to some server-side data structure that is kept in memory. This means that the cookie, which the browser passes along with each request, is used to establish conversational, or session, state.

If a cookie is used to store some information, such as an authentication token, that the server can validate without reliance on session state, cookies are perfectly RESTful – with one caveat: They shouldn’t be used to encode information that can be transferred by other, more standardized means (e.g. in the URI, some standard header or – in rare cases – in the message body). For example, it’s preferable to use HTTP authentication from a RESTful HTTP point of view.

Forgetting hypermedia

The first REST idea that’s hard to accept is the standard set of methods. REST theory doesn’t specify which methods make up the standard set, it just says there should be a limited set that is applicable to all resources. HTTP fixes them at GET, PUT, POST and DELETE (primarily, at least), and casting all of your application semantics into just these four verbs takes some getting used to. But once you’ve done that, people start using a subset of what actually makes up REST – a sort of Web-based CRUD (Create, Read, Update, Delete) architecture. Applications that expose this anti-pattern are not really “unRESTful” (if there even is such a thing), they just fail to exploit another of REST’s core concepts: hypermedia as the engine of application state.

Hypermedia, the concept of linking things together, is what makes the Web a web – a connected set of resources, where applications move from one state to the next by following links. That might sound a little esoteric, but in fact there are some valid reasons for following this principle.

The first indicator of the “Forgetting hypermedia” anti-pattern is the absence of links in representations. There is often a recipe for constructing URIs on the client side, but the client never follows links because the server simply doesn’t send any. A slightly better variant uses a mixture of URI construction and link following, where links typically represent relations in the underlying data model. But ideally, a client should have to know a single URI only; everything else – individual URIs, as well as recipes for constructing them e.g. in case of queries – should be communicated via hypermedia, as links within resource representations. A good example is the Atom Publishing Protocol with its notion of service documents, which offer named elements for each collection within the domain that it describes. Finally, the possible state transitions the application can go through should be communicated dynamically, and the client should be able to follow them with as little before-hand knowledge of them as possible. A good example of this is HTML, which contains enough information for the browser to offer a fully dynamic interface to the user.

I considered adding “human readable URIs” as another anti-pattern. I did not, because I like readable and “hackable” URIs as much as anybody. But when someone starts with REST, they often waste endless hours in discussions about the “correct” URI design, but totally forget the hypermedia aspect. So my advice would be to limit the time you spend on finding the perfect URI design (after all, their just strings), and invest some of that energy into finding good places to provide links within your representations.

Ignoring MIME types

HTTP’s notion of content negotiation allows a client to retrieve different representations of resources based on its needs. For example, a resource might have a representation in different formats such as XML, JSON, or YAML, for consumption by consumers implemented in Java, JavaScript, and Ruby respectively. Or there might be a “machine-readable” format such as XML in addition to a PDF or JPEG version for humans. Or it might support both the v1.1 and the v1.2 versions of some custom representation format. In any case, while there may be good reasons for having one representation format only, it’s often an indication of another missed opportunity.

It’s probably obvious that the more unforeseen clients are able to (re-)use a service, the better. For this reason, it’s much better to rely on existing, pre-defined, widely-known formats than to invent proprietary ones – an argument that leads to the last anti-pattern addressed in this article.

Breaking self-descriptiveness

This anti-pattern is so common that it’s visible in almost every REST application, even in those created by those who call themselves “RESTafarians” – myself included: breaking the constraint of self-descriptiveness (which is an ideal that has less to do with AI science fiction than one might think at first glance). Ideally, a message – an HTTP request or HTTP response, including headers and the body – should contain enough information for any generic client, server or intermediary to be able to process it. For example, when your browser retrieves some protected resource’s PDF representation, you can see how all of the existing agreements in terms of standards kick in: some HTTP authentication exchange takes place, there might be some caching and/or revalidation, the content-type header sent by the server (“application/pdf”) triggers the startup of the PDF viewer registered on your system, and finally you can read the PDF on your screen. Any other user in the world could use his or her own infrastructure to perform the same request. If the server developer adds another content type, any of the server’s clients (or service’s consumers) just need to make sure they have the appropriate viewer installed.

Every time you invent your own headers, formats, or protocols you break the self-descriptiveness constraint to a certain degree. If you want to take an extreme position, anything not being standardized by an official standards body breaks this constraint, and can be considered a case of this anti-pattern. In practice, you strive for following standards as much as possible, and accept that some convention might only apply in a smaller domain (e.g. your service and the clients specifically developed against it).

Summary

Ever since the “Gang of Four” published their book, which kick-started the patterns movement, many people misunderstood it and tried to apply as many patterns as possible – a notion that has been ridiculed for equally as long. Patterns should be applied if, and only if, they match the context. Similarly, one could religiously try to avoid all of the anti-patterns in any given domain. In many cases, there are good reasons for violating any rule, or in REST terminology: relax any particular constraint. It’s fine to do so – but it’s useful to be aware of the fact, and then make a more informed decision.

Hopefully, this article helps you to avoid some of the most common pitfalls when starting your first REST projects.

Many thanks to Javier Botana and Burkhard Neppert for feedback on a draft of this article.

An Introduction to Virtualization

The IT industry makes heavy use of buzzwords and ever changing terms to define itself. Sometimes the latest nomenclature the industry uses is a particular technology such as x86 or a concept such as green computing. Terms rise and fall out of favor as the industry evolves. In recent years the term virtualization has become the industry’s newest buzzword. This raises the question … just what is virtualization? The first concept that comes to the mind of the average industry professional is running one or more guest operating systems on a host. However, digging a little deeper reveals this definition is too narrow. There are a large number of services, hardware, and software that can be “virtualized”. This article will take a look at these different types of virtualization along with the pros and cons of each.

What is virtualization?

Before discussing the different categories of virtualization in detail, it is useful to define the term in the abstract sense. Wikipedia uses the following definition: “In computing, virtualization is a broad term that refers to the abstraction of computer resources. Virtualization hides the physical characteristics of computing resources from their users, be they applications, or end users. This includes making a single physical resource (such as a server, an operating system, an application, or storage device) appear to function as multiple virtual resources; it can also include making multiple physical resources (such as storage devices or servers) appear as a single virtual resource...”

RelatedVendorContent

Rainmaking - IBM's software virtualization strategy (Jerry Cuomo CTO blog)

Introducing Application Infrastructure Virtualization and WebSphere Virtual Enterprise

Virtualize and Govern Your SOA with WebSphere Virtual Enterprise

WebSphere Virtual Enterprise 3 minute demo

Taming the Application Infrastructure using WebSphere Virtual Enterprise

Related Sponsor

WebSphere Virtual Enterprise provides application infrastructure virtualization capabilities that lower costs, increase flexibility, improve service, performance, availability and resiliency.

In layman’s terms virtualization is often:

The creation of many virtual resources from one physical resource.
The creation of one virtual resource from one or more physical resource.

The term is frequently used to convey one of these concepts in a variety of areas such as networking, storage, and hardware.

History

Virtualization is not a new concept. One of the early works in the field was a paper by Christopher Strachey entitled "Time Sharing in Large Fast Computers". IBM began exploring virtualization with its CP-40 and M44/44X research systems. These in turn lead to the commercial CP-67/CMS. The virtual machine concept kept users separated while simulating a full stand-alone computer for each.

In the 80’s and early 90’s the industry moved from leveraging singular mainframes to running collections of smaller and cheaper x86 servers. As a result the concept of virtualization become less prominent. That changed in 1999 with VMware’s introduction of VMware workstation. This was followed by VMware’s ESX Server, which runs on bare metal and does not require a host operating system.

Types of Virtualization

Today the term virtualization is widely applied to a number of concepts including:

Server Virtualization
Client / Desktop / Application Virtualization
Network Virtualization
Storage Virtualization
Service / Application Infrastructure Virtualization

In most of these cases, either virtualizing one physical resource into many virtual resources or turning many physical resources into one virtual resource is occurring.

Server Virtualization

Server virtualization is the most active segment of the virtualization industry featuring established companies such as VMware, Microsoft, and Citrix. With server virtualization one physical machine is divided many virtual servers. At the core of such virtualization is the concept of a hypervisor (virtual machine monitor). A hypervisor is a thin software layer that intercepts operating system calls to hardware. Hypervisors typically provide a virtualized CPU and memory for the guests running on top of them. The term was first used in conjunction with the IBM CP-370.

Hypervisors are classified as one of two types:

Type 1

Type 2

SWSoft’s Parallels Desktop

Related to type 1 hypervisors is the concept of paravirtualization. Paravirtualization is a technique in which a software interface that is similar but not identical to the underlying hardware is presented. Operating systems must be ported to run on top of a paravirtualized hypervisor. Modified operating systems use the "hypercalls" supported by the paravirtualized hypervisor to interface directly with the hardware. The popular Xen project makes use of this type of virtualization. Starting with version 3.0 however Xen is also able to make use of the hardware assisted virtualization technologies of Intel (VT-x) and AMD (AMD-V). These extensions allow Xen to run unmodified operating systems such as Microsoft Windows.

Server virtualization has a large number of benefits for the companies making use of the technology. Among those frequently listed:

Increased Hardware Utilization – This results in hardware saving, reduced administration overhead, and energy savings.
Security – Clean images can be used to restore compromised systems. Virtual machines can also provide sandboxing and isolation to limit attacks.
Development – Debugging and performance monitoring scenarios can be easily setup in a repeatable fashion. Developers also have easy access to operating systems they might not otherwise be able to install on their desktops.

Correspondingly there are a number of potential downsides that must be considered:

Security – There are now more entry points such as the hypervisor and virtual networking layer to monitor. A compromised image can also be propagated easily with virtualization technology.
Administration – While there are less physical machines to maintain there may be more machines in aggregate. Such maintenance may require new skills and familiarity with software that administrators otherwise would not need.
Licensing/Cost Accounting – Many software-licensing schemes do not take virtualization into account. For example running 4 copies of Windows on one box may require 4 separate licenses.
Performance – Virtualization effectively partitions resources such as RAM and CPU on a physical machine. This combined with hypervisor overhead does not result in an environment that focuses on maximizing performance.

Application/Desktop Virtualization

Virtualization is not only a server domain technology. It is being put to a number of uses on the client side at both the desktop and application level. Such virtualization can be broken out into four categories:

Local Application Virtualization/Streaming
Hosted Application Virtualization
Hosted Desktop Virtualization
Local Desktop Virtualization

Wikipedia defines application virtualization as follows:

Application virtualization is an umbrella term that describes software technologies that improve manageability and compatibility of legacy applications by encapsulating applications from the underlying operating system on which they are executed. A fully virtualized application is not installed in the traditional sense, although it is still executed as if it is. Application virtualization differs from operating system virtualization in that in the latter case, the whole operating system is virtualized rather than only specific applications.

With streamed and local application virtualization an application can be installed on demand as needed. If streaming is enabled then the portions of the application needed for startup are sent first optimizing startup time. Locally virtualized applications also frequently make use of virtual registries and file systems to maintain separation and cleanness from the user’s physical machine. Examples of local application virtualization solutions include Citrix Presentation Server and Microsoft SoftGrid. One could also include virtual appliances into this category such as those frequently distributed via VMware’s VMware Player.

Hosted application virtualization allows the user to access applications from their local computer that are physically running on a server somewhere else on the network. Technologies such as Microsoft’s RemoteApp allow for the user experience to be relatively seamless include the ability for the remote application to be a file handler for local file types.

Benefits of application virtualization include:

Security – Virtual applications often run in user mode isolating them from OS level functions.
Management – Virtual applications can be managed and patched from a central location.
Legacy Support – Through virtualization technologies legacy applications can be run on modern operating systems they were not originally designed for.
Access – Virtual applications can be installed on demand from central locations that provide failover and replication.

Disadvantages include:

Packaging – Applications must first be packaged before they can be used.
Resources – Virtual applications may require more resources in terms of storage and CPU.
Compatibility – Not all applications can be virtualized easily.

Wikipedia defines desktop virtualization as:

Desktop virtualization (or Virtual Desktop Infrastructure) is a server-centric computing model that borrows from the traditional thin-client model but is designed to give administrators and end users the best of both worlds: the ability to host and centrally manage desktop virtual machines in the data center while giving end users a full PC desktop experience.

Hosted desktop virtualization is similar to hosted application virtualization, expanding the user experience to be the entire desktop. Commercial products include Microsoft’s Terminal Services, Citrix’s XenDesktop, and VMware’s VDI.

Benefits of desktop virtualization include most of those with application virtualization as well as:

High Availability – Downtime can be minimized with replication and fault tolerant hosted configurations.
Extended Refresh Cycles – Larger capacity servers as well as limited demands on the client PCs can extend their lifespan.
Multiple Desktops – Users can access multiple desktops suited for various tasks from the same client PC.

Disadvantages of desktop virtualization are similar to server virtualization. There is also the added disadvantage that clients must have network connectivity to access their virtual desktops. This is problematic for offline work and also increases network demands at the office.

The final segment of client virtualization is local desktop virtualization. It could be said that this is where the recent resurgence of virtualization began with VMware’s introduction of VMware Workstation in the late 90’s. Today the market includes competitors such as Microsoft Virtual PC and Parallels Desktop. Local desktop virtualization has also played a key part in the increasing success of Apple’s move to Intel processors since products like VMware Fusion and Parallels allow easy access to Windows applications. Some the benefits of local desktop virtualization include:

Security – With local virtualization organizations can lock down and encrypt just the valuable contents of the virtual machine/disk. This can be more performant than encrypting a user’s entire disk or operating system.
Isolation – Related to security is isolation. Virtual machines allow corporations to isolate corporate assets from third party machines they do not control. This allows employees to use personal computers for corporate use in some instances.
Development/Legacy Support – Local virtualization allows a users computer to support many configurations and environments it would otherwise not be able to support without different hardware or host operating system. Examples of this include running Windows in a virtualized environment on OS X and legacy testing Windows 98 support on a machine that’s primary OS is Vista.

Network Virtualization

Up to this point the types of virtualization covered have centered on applications or entire machines. These are not the only granularity levels that can be virtualized however. Other computing concepts also lend themselves to being software virtualized as well. Network virtualization is one such concept. Wikipedia defines network virtualization as:

In computing, network virtualization is the process of combining hardware and software network resources and network functionality into a single, software-based administrative entity, a virtual network. Network virtualization involves platform virtualization, often combined with resource virtualization. Network virtualization is categorized as either external, combining many networks, or parts of networks, into a virtual unit, or internal, providing network-like functionality to the software containers on a single system…

Using the internal definition of the term, desktop and server virtualization solutions provide networking access between both the host and guest as well as between many guests. On the server side virtual switches are gaining acceptance as a part of the virtualization stack. The external definition of network virtualization is probably the more used version of the term however. Virtual Private Networks (VPNs) have been a common component of the network administrators’ toolbox for years with most companies allowing VPN use. Virtual LANs (VLANs) are another commonly used network virtualization concept. With network advances such as 10 gigabit Ethernet, networks no long need to be structured purely along geographical lines. Companies with products in the space include Cisco and 3Leaf.

In general benefits of network virtualization include:

Customization of Access – Administrators can quickly customize access and network options such as bandwidth throttling and quality of service.
Consolidation – Physical networks can be combined into one virtual network for overall simplification of management.

Similar to server virtualization, network virtualization can bring increased complexity, some performance overhead, and the need for administrators to have a larger skill set.

Storage Virtualization

Another computing concept that is frequently virtualized is storage. Unlike the definitions we have seen up to this point that have been complex at times, Wikipedia defines storage virtualization simply as:

Storage virtualization refers to the process of abstracting logical storage from physical storage.

While RAID at the basic level provides this functionality, the term storage virtualization typically includes additional concepts such as data migration and caching. Storage virtualization is hard to define in a fixed manner due to the variety of ways that the functionality can be provided. Typically, it is provided as a feature of:

Host Based with Special Device Drivers
Array Controllers
Network Switchs
Stand Alone Network Appliances

Each vendor has a different approach in this regard. Another primary way that storage virtualization is classified is whether it is in-band or out-of-band. In-band (often called symmetric) virtualization sits between the host and the storage device allowing caching. Out-of-band (often called asymmetric) virtualization makes use of special host based device drivers that first lookup the meta data (indicating where a file resides) and then allows the host to directly retrieve the file from the storage location. Caching at the virtualization level is not possible with this approach.

General benefits of storage virtualization include:

Migration – Data can be easily migrated between storage locations without interrupting live access to the virtual partition with most technologies.
Utilization – Similar to server virtualization, utilization of storage devices can be balanced to address over and under utilitization.
Management – Many hosts can leverage storage on one physical device that can be centrally managed.

Some of the disadvantages include:

Lack of Standards and Interoperability – Storage virtualization is a concept and not a standard. As a result vendors frequently do not easily interoperate.
Metadata – Since there is a mapping between logical and physical location, the storage metadata and its management becomes key to a working reliable system.
Backout – The mapping between local and physical locations also makes the backout of virtualization technology from a system a less than trivial process.

Service / Application Infrastructure Virtualization

Enterprise application providers have also taken note of the benefits of virtualization an begun offering solutions that allow the virtualization of commonly used applications such as Apache as well as application fabric platforms that allow software to easily be developed with virtualization capabilities from the ground up.

Application infrastructure virtualization (sometimes referred to as application fabrics) unbundle an application from a physical OS and hardware. Application developers can then write to a virtualization layer. The fabric can then handle features such as deployment and scaling. In essence this process is the evolution of grid computing into a fabric form that provides virtualization level features. Companies such as Appistry and DataSynapse provides features including:

Virtualized Distribution
Virtualized Processing
Dynamic Resource Discovery

IBM has also embraced the virtualization concept at the application infrastructure level with the rebranding and continued of enhancement of Websphere XD as Websphere Virtual Enterprise. The product provides features such as service level management, performance monitoring, and fault tolerance. The software runs on a variety of Windows, Unix, and Linux based operating systems and works with popular application servers such as WebSphere, Apache, BEA, JBoss, and PHP application servers. This lets administrators deploy and move application servers at a virtualization layer level instead of at the physical machine level.

Final Thoughts

In summary it should now be apparent that virtualization is not just a server-based concept. The technique can be applied across a broad range of computing including the virtualization of:

Entire Machines on Both the Server and Desktop
Applications/Desktops
Storage
Networking
Application Infrastructure

The technology is evolving in a number of different ways but the central themes revolve around increased stability in existing areas and accelerating adoption by segments of the industry that have yet to embrace virtualization. The recent entry of Microsoft into the bare-metal hypervisor space with Hyper-V is a sign of the technology’s maturity in the industry.

Beyond these core elements the future of virtualization is still being written. A central dividing line is feature or product. For some companies such as RedHat and many of the storage vendors, virtualization is being pushed as a feature to complement their existing offerings. Other companies such as VMware have built entire businesses with virtualization as product. InfoQ will continue to cover the technology and companies involved as the space evolves.

Saturday, July 26, 2008

Hackers get hold of critical Internet flaw

SAN FRANCISCO (AFP) - Internet security researchers on Thursday warned that hackers have caught on to a "critical" flaw that lets them control traffic on the Internet.

An elite squad of computer industry engineers that labored in secret to solve the problem released a software "patch" two weeks ago and sought to keep details of the vulnerability hidden at least a month to give people time to protect computers from attacks.

"We are in a lot of trouble," said IOActive security specialist Dan Kaminsky, who stumbled upon the Domain Name System (DNS) vulnerability about six months ago and reached out to industry giants to collaborate on a solution.

"This attack is very good. This attack is being weaponized out in the field. Everyone needs to patch, please," Kaminsky said. "This is a big deal."

DNS is used by every computer that links to the Internet and works similar to a telephone system routing calls to proper numbers, in this case the online numerical addresses of websites.

The vulnerability allows "cache poisoning" attacks that tinker with data stored in computer memory caches that relay Internet traffic to its destination.

Attackers could use the vulnerability to route Internet users wherever the hackers wanted, no matter what website address is typed into a web browser.

The threat is greatest for business computers handling online traffic or hosting websites, according to security researchers.

The flaw is a boon for "phishing" cons that involve leading people to imitation web pages of businesses such as bank or credit card companies to trick them into disclosing account numbers, passwords and other information.

"I was not intentionally seeking to cause anything that could break the Internet," Kaminsky said Thursday during a conference call with peers and media. "It's a little weird to talk about it out loud."

Kaminsky built a web page, www.doxpara.com, where people can find out whether their computers have the DNS vulnerability. As of Thursday, slightly more than half the computers tested at the website still needed to be patched.

Read the entire article originally published on Yahoo News by Glen Chapman

Unsolicited calls offering Credit Cards, your enhanced rights. Kudos to RBI!

If you receive an unsolicited credit card, you can now make your bank pay for the inconvenience caused to you.

The bank issuing the card will not only have to pay a penalty to the Reserve Bank of India, but also offer monetary compensation to the customer.

The RBI on Thursday issued a circular, which lists out a series of dos and don’ts about the protocol that will have to be maintained in the case of credit cards.

The circular says that if an unsolicited card is issued , activated and billed for without the consent of the customer, the card issuing bank will not only have to reverse the charges, but also pay a penalty amounting to twice the value of the charges reversed.

Additionally, help is also at hand from the banking ombudsman who will determine the amount of compensation payable for the loss of the complainant’s time, expenses incurred, harassment and the mental anguish suffered by him.

The bank will also be held responsible if a card is misused before it reaches the customer. “It is clarified that any loss arising out of misuse of such unsolicited cards will be the responsibility of the card issuing bank only,” the RBI said.

To prevent misuse, banks have been asked to consider issuing cards with photographs of the cardholder, cards with the PIN and signature laminated cards.

As in the case of loans, banks will also have to prescribe a ceiling rate of interest, processing and other charges in the case of credit card dues.

Banks, which offer accidental death and disability insurance on their cards in tie-up with insurers have been asked to obtain in writing the details of the nominees for these benefits.

Customers will also have to be given the option to decide whether they want the bank to share their personal information with other agencies.

The original notice to banks<http://rbidocs.rbi.org.in/rdocs/notification/PDFs/85811.pdf>from
Reserve Bank of India.
--

Monday, April 28, 2008

India should celebrate World IP Day

To achieve environmental, social and economic progress, humanity needs to innovate. April 26 is World Intellectual Property (IP) Day, a day that should be about celebrating the essential role IP plays in promoting innovation. Instead, World IP Day is becoming a day to take stock of how much human innovation and ingenuity is under threat.

IP has always been a niche public policy area understood best by policy wonks and lawyers. Unless there is a major controversy, IP tends to escape public consciousness. But that is changing.

Over the past few years, campaigns to undermine IP have increased and are now reaching a fever pitch. IP is essential because it provides the property rights needed for research and development to attract investment with the prospect of a long-term dividend.

Undermining IP is equivalent to the traditional socialist ethos – distribute the spoils of today's research and development, rather than focusing on expanding it. And a lot is at stake -- according to the most recent figures from the United Nations, the Indian patent registry receives more than 90,000 applications for patentable inventions each year.

In spite of this significant contribution, there has been a global campaign to undermine IP rights by a group of anti-market activists, self-interested politicians, vested interests, and more recently, the infiltrated World Health Organization.

Innovative medicines have been one of the big targets. These activists have argued that IP rights increase the cost of medicines for the world's poor. Yet they ignore that one of the biggest contributors to increasing costs is actually government-imposed taxes and tariffs that raise the price of life-saving medicines.

For instance, in India the combined taxes and tariffs on imported medicines is 55 per cent; for China, it is 28 per cent. But this reality has not stopped governments acting to undermine IP.

In early 2007, the then-Thai military government waived the patents of three patented medicines through a process called "compulsory licensing".

Compulsory licensing is an instrument recognized under the World Trade Organization's Trade-Related Aspects of Intellectual Property Rights (TRIPS) Agreement, grants governments the ability to license the production of patented products "in the case of a national emergency or other circumstances of extreme urgency or in cases of public non-commercial use."

In the case of medicines, the provision was designed to ensure patented products could be mass-produced during serious public health emergencies. Yet the Thai government allowed a profit-making government agency to produce the medicines. They also compulsory licensed Plavix, a heart disease medication. A heart disease medication hardly fits within the criteria of "national emergency" or "extreme urgency."

The Thai government's actions were an abuse of a reasonable interpretation of the TRIPS Agreement. Instead the Thai military government used it as an opportunity to reduce public expenditure on health and diverted it to boosting its own salaries and military budget.

Now the World Health Organization has waded into the debate. Last year, a WHO-designated team assessed the Junta's actions and later issued a brief report legitimizing the government's actions, which was followed by a how-to guide for countries to waive their international obligations and issue compulsory licenses.

This report is feeding into a WHO-initiated Intergovernmental Working Group (IGWG) on Public Health, Innovation and IP formed in 2006. From its inception the IGWG has been an attempt for health bureaucrats and the activists that advise them to re-write -- and undermine -- global IP rules. The activists are now using their campaign against IP on medicines as a precedent to continue their assault on IP; and global warming has become the new battleground.

In a joint statement at the 2007 G8 Summit, the governments of Brazil, China, India, Mexico and South Africa called for an agreement to assist in compulsory licensing the IP related to carbon dioxide emission-mitigating technology being developed in wealthy countries.

In subsequent media reports the officials argued an agreement is needed "paralleling the successful agreement on compulsory licensing of pharmaceuticals."

Similar themes appeared in a resolution passed by the European Parliament in November last year recommending a study to assess amending TRIPS "to allow for the compulsory licensing of environmentally necessary technologies."

And the tragedy is that those who are likely to suffer most are the world's poor. They are the ones most likely to suffer from a lack of investment in essential medicines or the predicted consequences of not reducing carbon dioxide emissions.

Technology transfer is also vital for developing countries to grow their economies and improve their standards of health and the environment.

A 2006 World Bank study and a 1998 International Energy Agency/UNEP study have identified that strengthening IP rights assists in technology transfer.

The World Intellectual Property Organization has designated 2008 as the year for "celebrating innovation and promoting respect for IP."

With the IGWG convening in Geneva in a few days' time and the assault on IP on climate-friendly technologies, World IP Day is becoming an opportunity to reflect on IP's demise.

Monday, September 8, 2008

Wednesday, August 6, 2008

RelatedVendorContent

Tunneling everything through GET

Tunneling everything through POST

Ignoring caching

Ignoring status codes

Misusing cookies

Forgetting hypermedia

Ignoring MIME types

Breaking self-descriptiveness

Summary

What is virtualization?

RelatedVendorContent

Related Sponsor

History

Types of Virtualization

Server Virtualization

Application/Desktop Virtualization

Network Virtualization

Storage Virtualization

Service / Application Infrastructure Virtualization

Final Thoughts

Saturday, July 26, 2008

Monday, April 28, 2008

Blog Archive