New Tech Report from Microsoft Research: Strongly-Typed Language Support for Internet-Scale Information Sources


 I’m very pleased to announce that Microsoft Research have published a new technical report related to F# 3.0 called Strongly-Typed Language Support for Internet-Scale Information Sources, (or go straight to the PDF).

To reference this work, please cite MSR technical report number MSR-TR-2012-101.

Abstract: A growing trend in both the theory and practice of programming is the interaction between programming and rich information spaces. From databases to web services to the semantic web to cloud-based data, the need to integrate programming with heterogeneous, connected, richly structured, streaming and evolving information sources is ever-increasing. Most modern applications incorporate one or more external information sources as integral components. Providing strongly typed access to these sources is a key consideration for strongly-typed programming languages, to insure low impedance mismatch in information access. At this scale, information integration strategies based on library design and code generation are manual, clumsy, and do not handle the internet-scale information sources now encountered in enterprise, web and cloud environments. In this report we describe the design and implementation of the type provider mechanism in F# 3.0 and its applications to typed programming with web ontologies, web-services, systems management information, database mappings, data markets, content management systems, economic data and hosted scripting. Type soundness becomes relative to the soundness of the type providers and the schema change in information sources, but the role of types in information-rich programming tasks is massively expanded, especially through tooling that benefits from rich types in explorative programming.

Introduction.

A key direction for the future evolution of programming is to allow strongly typed programming to “escape the box” of type structures defined in hand-written or tool-generated code, and to systematically bridge the gap between the language and the schematized information found in external information systems. In this report

  • We describe the design and implementation of a novel type-bridging mechanism, the type provider mechanism in F# 3.0.
  • We describe its applications to strongly typed programming with web ontologies, web-services, database mappings, directory navigation, content management systems, scientific data sets and hosted scripting.
  • We consider the tradeoffs of these mechanisms, including the relative soundness properties of the different systems that may be designed and implemented.
  • We describe how type-bridging both radically expands the role for names and types, but also challenges existing, comfortable assumptions about what types are, how they are selected and what properties they should have.
  • We illustrate the relative ease-of-use of the type provider mechanism as compared to alternate technologies, in addition to its performance and scaling benefits.

While we have made valuable initial progress for supporting information-rich applications, we believe that this area is an excellent opportunity for future language and tooling research, information-space modeling, schematization techniques, and language usability efforts.

This report is structured as follows. In Section 2, we consider the problem of information-rich programming, especially in the context of strongly-typed languages. Section 3 presents the type provider mechanism and explains its role in addressing information-rich programming problems, and Section 4 looks at specific examples of using the mechanism to integrate “internet-scale” information sources. Section 5 looks at themes that arise when using the type provider mechanism in practice, many of which raise interesting future R&D directions. In Section 6, we briefly describe how information-rich programming can affect our view of the logical characteristics usually associated with programming languages such as type-soundness. In Section 7 we describe other applications we have explored with the type provider mechanism, and in Section 8 we summarize, describe related work and future directions.

Enjoy!

Comments (6)

  1. MikeGale says:

    Reading the report.  Sounds like it answers some major concerns of mine.  Thanks very much.

    On the provider for strongly typed tabular data (CSV) you mention three approaches.  (In header, inference and "define in code", as ways to obtain type information.)  There is another approach which I remember using a few years.  It left a lasting impression, after I used it to solve an unexpected problem, quickly and with little pain.  It's a schema file that can be used to describe a set of CSV files.  It has the benefit of not intruding into individual file headers.  I'd recommend having a look at that approach as an additional (and powerful) addition way to define types and units for the CSV provider.

  2. MikeGale says:

    Two more thoughts on providers:

    1)  When I use .NET RegEx's seriously I use the comment mechanism.  (?#…)  Without that I find that RegEx's can become a bit like APL code.  Write Once, Never Read Again!  This mechanism might provide a way for RegEx writers to add more, like units of measure, to RegEx's.

    2)  For a long time I've wanted to make use of matrix computation languages like J (or APL) from .NET code.  These approaches are insanely great where they fit, but the interop (last time I checked) wasn't good enough.  Could a provider cross that gap without breaking the programmers flow of thought?

    Hopefully that's the end of these thoughts sparked by reading that excellent paper.

  3. MikeGale says:

    Yesterday I was wondering about a NetCDF type provider.  At the back of my mind I remembered a reference somewhere.  It's in the report, in case anybody else is wondering the same.

    <Extract>

    7       Further Applications

    In the course of our experiments with type providers, we have mainly explored

    information spaces which fit into a few broad categories:

    1.   Remote data sources such as databases and web-based services.

    2.   Structured file formats: such as Excel, CSV, TSV or netCDF.

    3.   DSL texts, such as regular expressions (where named groups provide the

        structure)  or   printf-style   format   strings   (where   placeholders   provide   the

        structure).

    4.   Code providers such as providers for interoperating with R or Python.

    </Extract>

    I haven't seen the netCDF provider released yet.

  4. Name says:

    Although i love the idea of F# type providers my first serious attempt to use them crashed hard.

    I was going to connect to a service (WCF) with WsdlService<"http://someurl/some.svc?wsdl"&gt;

    It fails epicly with:

    The type provider 'Microsoft.FSharp.Data.TypeProviders.DesignTime.DataProviders' reported an error:

    tmp6E6C.cs(9409,26): error CS0644:

    'System.ComponentModel.PropertyChangedEventHandler' cannot derive from special class

    'System.MulticastDelegate'

    c:WindowsMicrosoft.NETFrameworkv4.0.30319mscorlib.dll: (Location of symbol related to previous error)

    and a lot of other warnings which probably are not relevant:

    tmp6E6C.cs(290,28): warning CS0436: The type 'System.Data.DataRowState' in

    'c:UserssomeuserAppDataLocalTemptmp6E6C.cs' conflicts with the imported

    type 'System.Data.DataRowState' in 'c:WindowsMicrosoft.NETFrameworkv4.0.30319System.Data.dll'.

    Using the type defined in 'c:UserssomeuserAppDataLocalTemptmp6E6C.cs'.

    tmp6E6C.cs(9427,17): (Location of symbol related to previous warning)

    Is this a known feature 😉 or am i using it wrong?