Development

knowing when to walk away…

I don’t like to think of myself as a quitter; I especially don’t like to walk away from something I enjoy doing.  However, I’ve recently had to re-evaluate my workload (both personal and professional), and I realized that something had to give.  For me, that something was the Atlanta SQLSaturday 2011.

Now, before you panic, the project is still continuing without me; there’s still a team working very diligently to pull this together.  I’m just no longer working on it.

My reasons for stepping away were many:

  • My personal life is getting very complicated. I’m getting married this summer, and there’s a lot of details left to work out.  On top of that, I have two wonderful kids from my first marriage.  My family deserves no less than my best, and that means I need to put them first.
  • Volunteer work should be fun work.  If you’re leaving a volunteer project more stressed than when you went in, something’s not right.  In my case, I was too invested in the SQLSaturday project to really allow it to grow; I have certain ideas about what should be done, and it’s time to let others take it in a different direction. 
  • Volunteer work should enhance your professional and personal skills.  In my case, I’ve let some of my technical goals slip because I’m investing too much time in volunteering.  I need to finish my technical certifications, for example.
  • There’s other tasks to be done in the Atlanta SQL Server community.  I’ve spent the last few years working with SQLSaturday and AtlantaMDF; one of our original goals for SQLSaturday was to help feed the user group, and to be honest, that hasn’t happened.  Stepping back and looking objectively at the situation has made me realize that there’s some foundational work that needs to be done in the user group in order for it to truly benefit from community outreaches like SQLSaturday.

All of that being said, I’m still going to be doing volunteer work; I just need to make smarter investments of my time.  One of the harder lessons in life is knowing when to walk away from something in order to let it (and yourself) flourish.

SQLSaturday Atlanta 2011–advance notice

Last night a small group of us (Aaron Nelson, Audrey Hammonds, Julie Smith, Tim Radney, and me) met to discuss this year’s upcoming SQL Saturday; our goal is to make it a bigger event than last year, but still try to keep it very community-centric.  Here’s a couple of bullet points that I can tease you with now:

  • We’re still looking for a venue, but our hope is to have at least 7 tracks of content.  We know that we’re going to push for attendee numbers between 300 & 450 (limited seating in the Atlanta area).
  • Our tentative date is September 24; that’s right before PASS Summit.
  • We’re playing around with the idea of a pre-con on the Friday before.  Deep-dive for a low-cost.
  • The event shirts WILL kick a$$.
  • We’re trying to figure out how to work with sponsors to make sure that they get a lot of value out of this; after all, they’re footing the bill, and we want to make sure they leave our event satisfied with the exposure they get.
  • If we can swing it, we’re thinking about one MAC-DADDY prize for attendees.  

A lot depends on finding the right location; good location = more attendee seating = more sponsor funds. Hopefully, we can secure something in the next week or so.

Keep you posted as things develop.

Something new for 2011: bunches of little stuff

IMG_0015 OK, it’s the last day of March, and I’m phoning in this blog post.  It’s not that I haven’t spent a lot of time learning something new this month, it’s that I’ve got a pile of stuff to get through before leaving on vacation next week, and I really just don’t have more than 15 minutes to write up some of the things I’ve been working on.  So, here’s a short list with some links, and a commitment to do better next month.

Data Modeling

I spent a lot of time working on revisiting Data Modeling and Use Case Diagramming.  Although I’m still not a fan of UML, I have come to appreciate the benefit of simplifying the language we use to describe things to do.  I’ve been working a lot with someone who is very detail-oriented; as a conceptual person, it’s a challenge at times to bring those two paradigms together.

XQuery

I recently presented at SQL Saturday 70 on FLWOR, so I had to really brush up on my skills using XML.  I wanted to be able to answer all kinds of questions, so I did a deeper dive into the functions, and really focused on how to do some basic queries with XQuery in SQL Server.  I’m working on a blog series for this, but just haven’t found the time to put my fingers on the keyboard.

SSIS/StreamInsight

I’ve said it before; I’m probably the only ETL guy that uses SQL Server, but not SSIS.  A recent project at work caused me to have to build a small prototype using SSIS, and I learned quite a bit (and some newbie “gotchas”; again, I smell a blog post in the works).  Julie Smith, Rob Volk, and Andy Leonard all pointed me in the general direction of an interesting new product from Microsoft: StreamInsight.  I built the demos, and played with it, but I’ve got a long way to go before I can actually do something with it.

 

Anyway, sorry for the lack of insight; I need to dedicate more time to actually writing stuff down when I learn it, but perhaps that’s a lesson in and of itself.    

much delayed #sqlsat70 write-up

This write-up will be brief and to the point: SQL Saturday 70 rocked.  K. Brian Kelley (Blog|Twitter) and his team put together a great event (again), and it was a lot of fun catching up with so many SQL people.  Unfortunately, I had a rather severe sinus infection which kept me from really enjoying the event (in fact, I left after my sessions), but I did have a good time.

My sessions went very well; I had small crowds, but they were very involved.  I seem to keep picking esoteric topics (Data Architecture and XQuery), but the beauty of that is that I learn something new every time I start researching the subject.  One thing that stood out for me is that even though I pitched the event as an Intermediate event, I still had a lot of foundational material to cover.  I need to keep that in mind for future versions of my technical presentations.

General notes about the event: I have no complaints.  The space was wonderful, the speaker room was more than adequate, and the food was great.  The only thing that I think was missed was the same sin I’ve been guilty of at our AtlantaMDF events; the hosting organization wasn’t promoted enough.  We get so caught up in making sure the event flows smoothly that I think we forget to tout the monthly events enough.

Anyway, it was a great show.  If you attended my presentations, thank you; I hope you learned something.  The slides are available at the links below:

Data Architect: http://www.sqlsaturday.com/viewsession.aspx?sat=70&sessionid=3754

XQuery: http://www.sqlsaturday.com/viewsession.aspx?sat=70&sessionid=3755

#TSQL2sday: Emulating a FIRST aggregation

tsql2sday

Jes Borland is hosting this month’s T-SQL Tuesday, and it’s all about aggregations.  Here’s an old coding trick of mine to emulate a FIRST aggregation in T-SQL.  Say we have a table that has three columns:

  • ID, a uniqueidentifier
  • Name, a varchar that represents something, and
  • DateStored, a datetime that is set when the row is written to the table

And we populate that table like so:

CREATE TABLE TSQL2sDay_FirstAgg
   
(
     
ID UNIQUEIDENTIFIER
   
, NAME VARCHAR(20)
    ,
DateStored DATETIME DEFAULT GETUTCDATE()
    )
    
INSERT  INTO TSQL2sDay_FirstAgg
       
( ID, NAME )
VALUES  ( NEWID(), 'Peanut' )

WAITFOR DELAY '00:00:01'

INSERT  INTO TSQL2sDay_FirstAgg
       
( ID, NAME )
VALUES  ( NEWID(), 'Peanut' )

WAITFOR DELAY '00:00:01'

INSERT  INTO TSQL2sDay_FirstAgg
       
( ID, NAME )
VALUES  ( NEWID(), 'Orange' )

 

It’s easy to figure out the number of rows associated with each name:

-- SELECT data to verify order of DateStored
SELECT  ID
     
, NAME
     
, DateStored
FROM    TSQL2sDay_FirstAgg      

-- Basic Row Count by Name
SELECT  NAME
     
, RowCnt = COUNT(*)
FROM    TSQL2sDay_FirstAgg
GROUP BY NAME

 

but how do we figure out what the first ID was for each name along with the number of rows?  You could work something out using the HAVING clause of the SELECT statement, or you could do something like the following:

--SELECT first ID and count of rows by Name
SELECT  FirstID = CONVERT(UNIQUEIDENTIFIER, RIGHT(MIN(CONVERT(VARCHAR(24), DateStored, 121) + CONVERT(VARCHAR(36), ID)),
                                                 
36))
      ,
NAME
     
, RowCnt = COUNT(*)
FROM    TSQL2sDay_FirstAgg
GROUP BY NAME

 

It looks complicated, but it’s not; let’s step through it.

  1. We have to know some basic information about our data; in this case, we know that the datetime value associated with each row with a common name is different.  In other words, there are no two Peanuts with the same DateStored value.  This is important, because in order for there to be a first value, there must be some method of ALWAYS determining which one WAS first.  If two Peanuts showed up at the same time, the model is broken.
  2. The first thing we do is to CONVERT the DateStored value to a varchar; this allows us to concatenate it with other values.  The format of that varchar string is important; it must be precise, and it must sort in an ascending order.  The ODBC canonical format (with milliseconds) is a good candidate for this.
  3. We then CONVERT the uniqueidentifer to a varchar, and append it to the DateStored varchar value.  This gives us a lengthy string which can be sorted by the first 24 characters.
  4. We find the MIN of the string we constructed; this MIN value is determined by the optimizer based on the sorting value of the numbers in the DateStored value.
  5. We then take the RIGHT-most 36 characters (the length of a uniqueidentifier), and convert it back to a uniqueidentifier (so that we have our type back).

There are probably better solutions for this, but this is a simple trick that works under certain circumstances and is portable to several flavors of SQL.

Resolution checkup

As February draws to a close, I thought I’d do a quick check-up to see how well I was keeping up with my New Year’s resolution list.  In sum: not great, but not too bad, either.  I need to make some adjustments, but I think I can pull it back in.

Here’s the rundown (copied and pasted from the original, with some notes below):

Professional

Technical Skills

  • I want to learn something new every month.  My goal is to tackle something challenging, and be able to understand the ins and outs of it within 30 days.  For example, I want to finish tackling XML (including XSD’s) in SQL Server. 

I think I’m doing OK on this one; I haven’t really done great this month, but I have spent a little time each month working on something new.

  • I want to upgrade my certifications by the end of the year; I’ve been dancing around the MCITP exams for a while, and I need to finish them.

Spent a little time studying, but I need to get on this.

Presentation

  • I want to make at least 6 technical presentations by the end of the year; last year, I managed to eke out 8, but given some of the recent changes in my personal life (see below), I think 6 is reasonable.

I have two presentations scheduled for SQL Saturday 70 next month.

  • I will blog at least once a month about some technical topic (see the first bullet point under technical skills).

See the above point; as I learn, I blog.  I did miss the T-SQL Tuesday blog for Feb (which makes me sad).

Management

  • I will understand the SCRUM methodology, and learn how to implement it with my team at work.  Although I’m not a team leader, I AM the Senior Database Architect, and I need to code less, and teach more.  This is my year to do so.

I’ve done this; I’m moving on to something larger. 

Personal

Health

  • I’m getting married again this year, and I want to look good for my new wife.  I also want to avoid long-term health issues.  I was losing weight last year (until I started dating), and I want to get back on track.  I’d like to lose 50 lbs by October.

Started Weight Watchers and have lost about 10 pounds so far.  Have tapered off a bit, and I need to get back on this bandwagon.

  • I have apnea, and I’ve been horrible about using my CPAP on a regular basis.  I will use it regularly.

How about irregularly?

  • I need to exercise more, so I will find 20 minutes a day to do SOMETHING, even if it’s just walking around the office for 20 minutes.

Blech.  I did OK for about two days.

  • I will drink at least 8 glasses of water per day.

Does Diet Coke count as water?  Sigh; it looks like I’m not doing so hot in the Health area.

Spiritual

  • I’ve slacked off in my religious activities; my faith was nourished by church attendance during my divorce, and I need to start growing again.  I will find a new church in the next two months (my old church is too far to drive on a regular basis), and become a regular attendee.

Checked out a church; didn’t like it.

  • I choose to absorb the goodness from people who love me, and I will reject the poison from those who do not.  I will focus on the important things in life (like my kids, and my future bride), and worry less about the unimportant things (like who’s mowing the grass).

Mixed results on this; while I think I do a great job at spending time with my kids and my future bride, I’m still struggling with ways to handle conflict in a positive fashion.  My strategy now is direct confrontation, rather than continuing to tap-dance around issues.

Social

  • I will listen more to my children, my family, and my friends.  I will find ways to let them know I love them.

See above.

  • I will nurture my own friendships; while I love my fiance’s friends and family, I want to bring more to the table than just me.

Need to do better about this.

Financial

  • My divorce pulled me way off course.  While I’m a long way from being out of debt, I will continue to make strides in that area.  I will pay off at least one credit card ahead of schedule.

Not really making a lot of headway here;  this one may have to wait until my fiancee and I combine households (thus saving on rent payments).

  • I will save more; I plan to find ways to cut costs (like taking advantage of coupons, and eating out less).

Ditto.

There you have it; a mixed bag.  I think I’m making some positive steps in the right direction, but I’ve still got a long way to go.

What Should PASS be? #sqlpass

Andy Warren recently threw out a challenge for bloggers to “fix” things with the Professional Association for SQL Server in 3 years.   There have been some great responses so far (and I’m sorry if I’ve missed yours):

All of these posts have great ideas, and have influenced my thinking on my subject; I’ve had conversations with most of these authors about some of the finer points of the direction that PASS should take over the last year at Summit, SQL Saturdays, email, etc; the ideas that I’m going to post below are probably not too dissimilar than their thoughts (although we probably differ on some on the implementations of those ideas).

Heading off in a general direction…

Although Andy W. specifically asked for a 3-year plan, I think part of the problem with PASS is that the long-term vision is unclear.  There’s a big debate about whether or not PASS is a community organization, a business serving that community, or something else that’s not been well-defined.  Additionally, PASS struggles with its domain of influence; the organization is viewed as being U.S.-centric by most members outside of the states, and inside the states, the continued reliance on Microsoft’s presence in Seattle makes the organization seem distant to local users.  What should PASS be?

In a conversation with Andy W. a few months ago, I proposed that PASS should borrow from some of the great evangelistic traditions of Western civilization (I was originally thinking of a non-religious version of the five fold ministry of the early Christian church: apostles, prophets, evangelists, pastors, and teachers), and Andy threw out the word “guild”.  I like that concept; PASS should be a guild, providing training both in terms of learning about the tools (SQL Server and associated products) and growth in the guild (moving from a student to a master).  Guilds are both a community of learners, and a powerful force of influence; where the Summit goes, Microsoft should follow (instead of the other way around).  I think this thought echoes Grant’s call:

Get the word out that if you want training this is the place to be. If you want to be a trainer, this is the place to start, if you are a trainer, this is where you grow you brand.

Of course, that’s a long-term definitional goal ; in the short term, I see three areas for improvement.

Things to do in the next three years…

1. Have an election process that’s deemed fair and reliable by the majority of the membership. 

I applaud PASS for taking steps in this regard.  I obviously spent a great deal of time discussing this over the last 10 months, and I’ve arrived at a very different place than either Andy Leonard or Mike Walsh (I believe in a strong Nominating Committee with an opaque application process; Andy has called to abandon it altogether, and Mike believes in a simple pass-or-fail review of credentials).  While our viewpoints on the actual implementation may differ, I think we can all agree that PASS will continue to lack credibility if the method by which organizational power is attained is not supported by the constituency.    PASS needs to get the election process stabilized and supported before the next election.

2. Adopt the User Groups as an extension of the organization, rather than just partners in community.

The PASS Chapter model is essentially a good one; there is no better way (in my opinion) to reach SQL Server professionals interested in building their careers than through the User Groups.  Unfortunately, as Mike (and others) have pointed out, the loose affiliation between PASS and the chapters have left many chapter leaders questioning what does PASS really do for the chapters?  That needs to change.

Chapters should be the local arms of PASS; attendees to a chapter meeting should leave every meeting thinking that they are getting a monthly shot (albeit a smaller dosage) of the same knowledge that they get from a PASS SQLSaturday, a PASS SQLRally, and a PASS Summit.  Chapters should feel interconnected; as a chapter leader in Atlanta, I should know what topic TJay Belt is discussing in Utah, or what Roy Ernest is covering in Curaco.   I should feel confident (as should they) that I have access to the same resources for educating my members (including trained, professional speakers as well as online materials) as any other chapter.

Chapters should also be given the tools necessary to recruit new members to the guild, both those members of the community with lots of experience with SQL Server (and little-to-none with PASS) as well as those members of the community who are still figuring out what a clustered index is.  I realize that this is a huge task to take on in 3 years, but the initial groundwork must be laid; chapters need to feel that they are part of a larger organization, and they should be embraced as siblings (not distant cousins).

As a sidebar, I should note that while PASS chapters should not replace the online initiatives that PASS has recently invested in (the blogosphere and social networks), they should be the primary focus.   From my own personal perspective, I’ve recently discovered that as I’ve become less “plugged in” (changes in my personal life as well as new corporate firewall policies have prevented my social networking),  it’s been harder to stay invested in PASS and the SQL community.  For example, I missed the recent call for volunteers for Program Committee members; I’ve also missed quite a few calls for bloggers (like T-SQL Tuesday).  There needs to be better connectedness between “meatspace” (a term I borrowed from Brent Ozar) and the online community.

3.  Invest in the IT structure at HQ.

We’re an organization of information technology professionals, and as far as I know, we have a staff of 2 IT guys (a developer and an admin).  If PASS is going to be the essential tool for the SQL Server Professional, then the organization needs to build an IT infrastructure that can support community connectedness, the sharing of essential information, networking between members, and training resources to move passive members to active masters of their craft.  I am not sure what that would take, but I think the speaker bureau (as well as a speaker training program) is a good start.  PASS doesn’t need to be a SQLServerpedia or a SQL Server Central, but it does need to provide its membership with an awareness of what good SQL Server resources are, and how they should be used in the educational path of the member.

Summing Up…

As I said before, I’m envisioning PASS as a guild for SQL Server professionals; guilds have members with varying skill levels (from apprentice to master craftsman), and the goal of the guild is to train its members not only in the tools they use, but also in the ways of the guild.  We’ve got a long way to go, but I think we have some basic steps we need to master, and soon.

A simple codebuilder for parsing in T-SQL

If you’ve ever tried to parse a wide character column in T-SQL, you know two things:

  1. It’s a pain to do, and
  2. It’s a pain to do.

A lot of the data I deal with comes in syslog format, which can come in one of two formats: positional (the location of the data element is related to the type of data), and named attributes (which usually only include delimiters for complex strings).  Although I haven’t had much luck automating positional parsing, I’ve recently begun using Excel to help me with the named attributes. 

Here’s an example; I have a table with a message column that is pulling over syslog data from a firewall.  In a given day, I may have millions of rows like the following:

sn=AA17D5028EAA time="2011-01-26 13:40:14 UTC" fw=10.1.100.1 pri=1 c=512 m=522 msg="Malformed or unhandled IP packet dropped" n=1 src=10.1.1.23:32795:X1: dst=10.1.1.1:514:: proto=udp/17

Note that each attribute of this particular syslog message is identified with an attribute name (eg, sn, time, fw, etc).  In order to break out each of the elements in T-SQL, we can split the string using a combination of SUBSTRING and CHARINDEX, like so:

SELECT TOP 1
        m
= CONVERT(INT, SUBSTRING(MESSAGE, CHARINDEX(' m=', MESSAGE) + 3,
                                  
CHARINDEX(' ', MESSAGE, CHARINDEX(' m=', MESSAGE) + 3) - ( CHARINDEX(' m=', MESSAGE)
                                                                                              +
3 )))
      ,
time = CONVERT(DATETIME, SUBSTRING(MESSAGE, CHARINDEX(' time="', MESSAGE) + 7,
                                          
CHARINDEX('UTC"', MESSAGE, CHARINDEX(' time="', MESSAGE) + 7)
                                           - (
CHARINDEX(' time="', MESSAGE) + 7 )))
      ,
fw = CONVERT(VARCHAR(20), SUBSTRING(MESSAGE, CHARINDEX(' fw=', MESSAGE) + 4,
                                           
CHARINDEX(' ', MESSAGE, CHARINDEX(' fw=', MESSAGE) + 4) - ( CHARINDEX(' fw=',
                                                                                                       
MESSAGE) + 4 )))
FROM    syslogng (NOLOCK)

Note the repetition for each column; you need to find the position of a starting delimiter, the position of an ending delimiter, and supply to the SUBSTRING function the position of the starting delimiter, and the difference between the two.  You also need to determine the lingth of the starting identifier, and then I CONVERT to a specific data type.  Whee!

It gets even more fun when the attributes are optional; some syslog messages may have a proto code, and some may not.   When faced with this, you need to include a CASE option, like so:

SELECT TOP 1
        proto
= CONVERT(VARCHAR(20), CASE WHEN CHARINDEX(' proto=', MESSAGE) = 0 THEN NULL
                                         
ELSE SUBSTRING(MESSAGE, CHARINDEX(' proto=', MESSAGE) + 7,
                                                        
CHARINDEX(' ', MESSAGE, CHARINDEX(' proto=', MESSAGE) + 7)
                                                         - (
CHARINDEX(' proto=', MESSAGE) + 7 ))
                                    
END)
FROM    syslogng (NOLOCK)

 

One of our developers is working on a syslog parser in .NET code, but I needed a proof-of-concept, and I didn’t want to keep cutting and pasting to see if it was working.  Looking at the parsing, it’s very formulaic SQL.  When I think formulas, I think Excel, and so I whipped out the following:

image

Note that I have several input columns:

  • start, the starting delimiter
  • end, the ending delimiter (usually a space)
  • colname, the column name I want to use; usually the same as start, but stripped of extra characters.
  • type, the SQL type I want to convert the data to, and
  • optional, a column to decide if the attribute is optional per row or not.

I also have a hidden column (column F), which generates most of the SQL code:

=CONCATENATE("SUBSTRING(message, CHARINDEX(‘", A2, "’, message)+ ", LEN(A2), ", CHARINDEX(‘", B2, "’, message, CHARINDEX(‘", A2, "’, message)+", LEN(A2), ") – (CHARINDEX(‘", A2, "’, message)+", LEN(A2), "))")

This takes the starting and ending delimiters, the length of the starting delimiter, and plugs those values into a valid SQL statement.  I then create a SQL column, using the following formula:

=CONCATENATE(", ", C2,"CONVERT(", D2, ", ",  IF(E2="Y", CONCATENATE("CASE WHEN CHARINDEX(‘", A2, "’, message) = 0 THEN NULL ELSE ",F2, " END"), F2), ")")

If I were better at Excel, I’d use named ranges, but for my purposes, this is OK.   I append a column to the beginning, specify the type, and include a CASE statement based on whether or not my optional column includes a “Y”.

It took me longer to write this blog post than it did to generate a proof-of-concept, parsing each of the named attributes out from a syslog message.

I’m doing it wrong…

me_doing_it_wrong At some point in your career, you have to realize that you’re going about it in the wrong way.   It may hit you like a ton of bricks, or it might be a subtle realization, but either way you realize that things aren’t working out for you like you expected.  I’ve had a couple of those moments throughout my career; one was shortly after I flunked out of graduate school.  Nothing says “you’re doing it wrong” than sitting outside of your advisor’s office for a meeting that never happens.

I’ve had other epiphanies in my career, such as the time when my ethical standards were a little higher than my employers; when I got sent home by a GM after a discussion over my responsibilities, I started polishing my resume.   I was doing it wrong by working for the wrong company.

Recently, I’ve begun to realize that I’m not living up to my full potential in my career.  I’ve spent the last several years building an enterprise solution for my company that has become the core product of that company.  It’s a good product, and I’m proud of it.  However, like many small companies that have grown up fast,  our company is built on a complex ecosystem of ever-changing goals and feature requests.  We built a system based on assumptions, and we’ve become one of the leaders of our industry because we’re often the first to deliver a product for a niche market.  Many of the assumptions we made didn’t pan out, and the applications we’ve built have slowly degenerated into a mass of tangled wire and unrealistic expectations.  I realized this as I’ve struggled to add a new feature and retrofit it into this existing solution; it’s taking more and more time to solve development problems because we’re not sure what features are still being used by some employee in a dark corner of the building.

As I was rewriting a stored procedure for the fifth time trying to eke out a few more milliseconds of performance, I realized that I was thinking like an engineer.  Engineers find creative solutions to problems in a very hands-on way; they worry about wiring things together so that they work, and they work well.  Engineers are worried about the microcosm; as every geek’s favorite engineer (Scotty from Star Trek) would say “In four hours, the ship blows up.”  That’s pretty straightforward; under condition x, outcome y is to be expected in a certain amount of time.

The problem?  My title says Architect.  I’m supposed to be thinking about the big picture, not just how a couple of applications are wired together.  I’m supposed to understand (and enforce) the rules about how events become data, and how data becomes information.  I should be more concerned with defining the specifications for our system than trying to figure out this damned stored procedure (for the fifth time).  Maybe we shouldn’t even have this particular stored procedure; maybe with a little tweaking, we could eliminate the problem altogether.

So what does this mean for me?  Well, as part of my New Year’s resolutions, I’ve been determined to learn something new every month.  This month, I’ve been focused on what does it mean to be a Data Architect, and I’ve been trying to find a little time every day to transform myself from an engineer to an architect.  I’m not going to master all of these subjects at once, but here’s my working list (from high-level goals to specific action items).  I expect this list to evolve, but it’s a start.

High Level Goal: A Data Architect needs to establish the standards for information and data in the enterprise.

  • I need to document the information architecture of our division of the company, using a standard data flow diagram notation.  I need to spend some time daily refreshing my memory on that notation.
  • I need spend time with employees throughout the organization, discovering what the business entities are, and what the vocabulary for those entities are. 
  • After discovery, I need to publish a standard vocabulary document and data-dictionary, showing how we capture that information today:
    • I need to propose changes to our business vocabulary, and
    • I need to propose changes to our database schema to standardize our notation.

High Level Goal: A Data Architect needs to understand the nature of the enterprise’s information on all levels: physical, logical, and procedural.

  • I need to talk to our production DBA’s an understand how our database servers are set up physically, including the clustering structure, the drive arrays, the SAN, etc.
  • I need to talk to our engineers to understand how data gets to the databases.
  • I need to talk to our product owners to understand what information they want from the data, and what’s the best way to deliver it.

High Level Goal: A Data Architect needs to recommend the best architecture for information management, including a plan on how to get there from here.

  • I need to refresh my memory on all aspects of SQL Server, not just the parts I use on a daily basis.
  • After discovery, I need to recommend ways to improve efficiency in our data capture processes.
  • I need to listen to all voices in the organization, even those I don’t normally agree with.  I can’t afford to throw away good ideas simply because I don’t always like the originator of those ideas.

More to come, but this is what I’ve been working on so far this month (February 2011).